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PREFACE TO THE FIFTH EDITION 


This book describes statistical models and methods for analyzing discrete time series and 
presents important applications of the methodology. The models considered include the 
class of autoregressive integrated moving average (ARIMA) models and various extensions 
of these models. The properties of the models are examined and statistical methods for 
model specification, parameter estimation, and model checking are presented. Applications 
to forecasting nonseasonal as well as seasonal time series are described. Extensions of the 
methodology to transfer function modeling of dynamic relationships between two or more 
time series, modeling the effects of intervention events, multivariate time series modeling, 
and process control are discussed. Topics such as state-space and structural modeling, 
nonlinear models, long-memory models, and conditionally heteroscedastic models are 
also covered. The goal has been to provide a text that is practical and of value to both 
academicians and practitioners. 

The first edition of this book appeared in 1970 and around that time there was a great 
upsurge in research on time series analysis and forecasting. This generated a large influx of 
new ideas, modifications, and improvements by many authors. For example, several new 
research directions began to emerge in econometrics around that time, leading to what is 
now known as time series econometrics. Many of these developments were reflected in the 
fourth edition of this book and have been further elaborated upon in this new edition. 

The main goals of preparing a new edition have been to expand and update earlier 
material, incorporate new literature, enhance and update numerical illustrations through 
the use of R, and increase the number of exercises in the book. Some of the chapters in 
the previous edition have been reorganized. For example, Chapter 14 on multivariate time 
series analysis has been reorganized and expanded, placing more emphasis on vector au¬ 
toregressive (VAR) models. The VAR models are by far the most widely used multivariate 
time series models in applied work. This edition provides an expanded treatment of these 
models that includes software demonstrations. 

Chapter 10 has also been expanded and updated. This chapter covers selected topics in 
time series analysis that either extend or supplement material discussed in earlier chapters. 
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This includes unit roots testing, modeling of conditional heteroscedasticity, nonlinear mod¬ 
els, and long memory models. A section of unit root testing that appeared in Chapter 7 of the 
previous edition has been expanded and moved to Section 10.1 in this edition. Section 10.2 
deals with autoregressive conditionally heteroscedastic models, such as the ARCH and 
GARCH models. These models focus on the variability in a time series and are useful for 
modeling the volatility or variability in economic and financial series, in particular. The 
treatment of the ARCH and GARCH models has been expanded and several extensions 
have been added. 

Elsewhere in the text, the exposition has been enhanced by revising, modifying, and 
omitting text as appropriate. Several tables have either been edited or replaced by graphs 
to make the presentation more effective. The number of exercises has been increased 
throughout the text and they now appear at the end of each chapter. 

A further enhancement to this edition is the use of the statistical software R for model 
building and forecasting. The R package is available as a free download from the R Project 
for Statistical Computing at www.r-project.org. A brief description of the software is given 
in Appendix A 1.1 of Chapter 1. Graphs generated using R now appear in many of the 
chapters along with R code that will help the reader reconstruct the graphs. The software 
is also used for numerical illustration in many of the examples in the text. 

The fourth edition of this book was published by Wiley in 2008. Plans for a new edition 
began during the fall of 2012.1 was deeply honored when George Box asked me to help him 
with this update. George was my Ph.D. advisor at the University of Wisconsin-Madison 
and remained a dear friend to me over the years as he did to all his students. Sadly, he was 
rather ill when the plans for this new edition were finalized towards the end of 2012. He 
did not have a chance to see the project completed as he passed away in March of 2013.1 
am deeply grateful for the opportunity to work with him and for the confidence he showed 
in assigning me this task. The book is dedicated to his memory and to the memory of his 
distinguished co-authors Gwilym Jenkins and Gregory Reinsel. Their contributions were 
many and they are all missed. 

I also want to express my gratitude to several friends and colleagues in the time series 
community who have read the manuscript and provided helpful comments and suggestions. 
These include Ruey Tsay, William Wei, Sung Ahn, and Raja Velu who have read Chapter 14 
on multivariate time series analysis, and David Dickey, Johannes Ledolter, Timo Terasvirta, 
and Niels Haldrup who have read Chapter 10 on special topics. Their constructive comments 
and suggestions are much appreciated. Assistance and support from Paul Lindholm in 
Finland is also gratefully acknowledged. The use of R in this edition includes packages 
developed for existing books on time series analysis such as Cryer and Chan (2010), 
Shumway and Staffer (2011), and Tsay (2014). We commend these authors for making 
their code and datasets available for public use through the R Project. 

Research for the original version of this book was supported by the Air Force Office of 
Scientific Research and by the British Science Research Council. Research incorporated 
in the third edition was partially supported by the Alfred P. Sloan Foundation and by the 
National Aeronautics and Space Administration. Permission to reprint selected tables from 
Biometrika Tables for Statisticians, Vol. 1, edited by E. S. Pearson and H. O. Hartley is 
also acknowledged. On behalf of my co-authors, I would like to thank George Tiao, David 
Mayne, David Pierce, Granville Tunnicliffe Wilson, Donald Watts, John Hampton, Elaine 
Hodkinson, Patricia Blant, Dean Wichern, David Bacon, Paul Newbold, Hiro Kanemasu, 
Farry Haugh, John MacGregor, Bovas Abraham, Johannes Fedolter, Gina Chen, Raja 
Velu, Sung Ahn, Michael Wincek, Carole Feigh, Mary Esser, Sandy Reinsel, and 
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Meg Jenkins, for their help, in many different ways, in preparing the earlier editions. 
A very special thanks is extended to Claire Box for her long-time help and support. 

The guidance and editorial support of Jon Gurstelle and Sari Friedman at Wiley is 
gratefully acknowledged. We also thank Stephen Quigley for his help in setting up the 
project, and Katrina Maceda and Shikha Pahuja for their help with the production. 

Finally, I want to express my gratitude to my husband Bert Beander for his encourage¬ 
ment and support during the preparation of this revision. 


Greta M. Ljung 

Lexington, MA 
May 2015 




PREFACE TO THE FOURTH EDITION 


It may be of interest to briefly recount how this book came to be written. Gwilym Jenkins 
and I first became friends in the late 1950s. We were intrigued by an idea that a chemical 
reactor could be designed that optimized itself automatically and could follow a moving 
maximum. We both believed that many advances in statistical theory came about as a result 
of interaction with researchers who were working on real scientific problems. Helping to 
design and build such a reactor would present an opportunity to further demonstrate this 
concept. 

When Gwilym Jenkins came to visit Madison for a year, we discussed the idea with 
the famous chemical engineer Olaf Hougen, then in his eighties. He was enthusiastic and 
suggested that we form a small team in a joint project to build such a system. The National 
Science Foundation later supported this project. It took 3 years, but suffice it to say, that 
after many experiments, several setbacks, and some successes the reactor was built and it 
worked. 

As expected, this investigation taught us a lot. In particular, we acquired proficiency in 
the manipulation of difference equations that were needed to characterize the dynamics of 
the system. It also gave us a better understanding of nonstationary time series required for 
realistic modeling of system noise. This was a happy time. We were doing what we most 
enjoyed doing: interacting with experimenters in the evolution of ideas and the solution of 
real problems, with real apparatus and real data. 

Later there was fallout in other contexts, for example, advances in time series analysis, 
in forecasting for business and economics, and also developments in statistical process 
control (SPC) using some notions learned from the engineers. 

Originally Gwilym came for a year. After that I spent each summer with him in England 
at his home in Lancaster. For the rest of the year, we corresponded using small reel-to-reel 
tape recorders. We wrote a number of technical reports and published some papers but 
eventually realized we needed a book. The first two editions of this book were written 
during a period in which Gwilym was, with extraordinary courage, fighting a debilitating 
illness to which he succumbed sometime after the book had been completed. 
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Later Gregory Reinsel, who had profound knowledge of the subject, helped to complete 
the third edition. Also in this fourth edition, produced after his untimely death, the new 
material is almost entirely his. In addition to a complete revision and updating, this fourth 
edition resulted in two new chapters: Chapter 10 on nonlinear and long memory models 
and Chapter 12 on multivariate time series. 

This book should be regarded as a tribute to Gwilym and Gregory. 

I was especially blessed to work with two such gifted colleagues. 


George E. P. Box 


Madison, Wisconsin 
March 2008 



PREFACE TO THE THIRD EDITION 


This book is concerned with the building of stochastic (statistical) models for time series 
and their use in important areas of application. This includes the topics of forecasting, 
model specification, estimation, and checking, transfer function modeling of dynamic 
relationships, modeling the effects of intervention events, and process control. Coincident 
with the first publication of Time Series Analysis: Forecasting and Control , there was a 
great upsurge in research in these topics. Thus, while the fundamental principles of the kind 
of time series analysis presented in that edition have remained the same, there has been a 
great influx of new ideas, modifications, and improvements provided by many authors. 

The earlier editions of this book were written during a period in which Gwilym Jenkins 
was, with extraordinary courage, fighting a slowly debilitating illness. In the present revi¬ 
sion, dedicated to his memory, we have preserved the general structure of the original book 
while revising, modifying, and omitting text where appropriate. In particular, Chapter 7 
on estimation of ARMA models has been considerably modified. In addition, we have 
introduced entirely new sections on some important topics that have evolved since the 
first edition. These include presentations on various more recently developed methods for 
model specification, such as canonical correlation analysis and the use of model selection 
criteria, results on testing for unit root nonstationarity in ARIMA processes, the state-space 
representation of ARMA models and its use for likelihood estimation and forecasting, score 
tests for model checking, structural components, and deterministic components in time se¬ 
ries models and their estimation based on regression-time series model methods. A new 
chapter (12) has been developed on the important topic of intervention and outlier analysis, 
reflecting the substantial interest and research in this topic since the earlier editions. 

Over the last few years, the new emphasis on industrial quality improvement has strongly 
focused attention on the role of control both in process monitoring and in process adjust¬ 
ment. The control section of this book has, therefore, been completely rewritten to serve 
as an introduction to these important topics and to provide a better understanding of 
their relationship. 
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PREFACE TO THE THIRD EDITION 


The objective of this book is to provide practical techniques that will be available to 
most of the wide audience who could benefit from their use. While we have tried to remove 
the inadequacies of earlier editions, we have not attempted to produce here a rigorous 
mathematical treatment of the subject. 

We wish to acknowledge our indebtedness to Meg (Margaret) Jenkins and to our wives, 
Claire and Sandy, for their continuing support and assistance throughout the long period 
of preparation of this revision. 

Research on which the original book was based was supported by the Air Force Office 
of Scientific Research and by the British Science Research Council. Research incorporated 
in the third edition was partially supported by the Alfred P. Sloan Foundation and by the 
National Aeronautics and Space Administration. We are grateful to Professor E. S. Pearson 
and the Biometrika Trustees for permission to reprint condensed and adapted forms of 
Tables 1, 8, and 12 of Biometrika Tables for Statisticians, Vol. 1, edited by E. S. Pearson 
and H. O. Hartley, to Dr. Casimer Stralkowski for permission to reproduce and adapt 
three figures from his doctoral thesis, and to George Tiao, David Mayne, Emanuel Parzen, 
David Pierce, Granville Wilson, Donald Watts, John Hampton, Elaine Hodkinson, Patricia 
Blant, Dean Wichern, David Bacon, Paul Newbold, Hiro Kanemasu, Larry Haugh, John 
MacGregor, Bovas Abraham, Gina Chen, Johannes Ledolter, Greta Ljung, Carole Leigh, 
Mary Esser, and Meg Jenkins for their help, in many different ways, in preparing the 
earlier editions. 


George Box and Gregory Reinsel 
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INTRODUCTION 


A time series is a sequence of observations taken sequentially in time. Many sets of data 
appear as time series: a monthly sequence of the quantity of goods shipped from a factory, a 
weekly series of the number of road accidents, daily rainfall amounts, hourly observations 
made on the yield of a chemical process, and so on. Examples of time series abound in 
such fields as economics, business, engineering, the natural sciences (especially geophysics 
and meteorology), and the social sciences. Examples of data of the kind that we will be 
concerned with are displayed as time series plots in Figures 2.1 and 4.1. An intrinsic 
feature of a time series is that, typically, adjacent observations are dependent. The nature 
of this dependence among observations of a time series is of considerable practical interest. 
Time series analysis is concerned with techniques for the analysis of this dependence. This 
requires the development of stochastic and dynamic models for time series data and the use 
of such models in important areas of application. 

In the subsequent chapters of this book, we present methods for building, identifying, 
fitting, and checking models for time series and dynamic systems. The methods discussed 
are appropriate for discrete (sampled-data) systems, where observation of the system occurs 
at equally spaced intervals of time. 

We illustrate the use of these time series and dynamic models in five important areas of 
application: 

1. The forecasting of future values of a time series from current and past values. 

2. The determination of the transfer function of a system subject to inertia—the deter¬ 
mination of a dynamic input-output model that can show the effect on the output of 
a system of any given series of inputs. 

3. The use of indicator input variables in transfer function models to represent and 
assess the effects of unusual intervention events on the behavior of a time series. 


Time Series Analysis: Forecasting and Control, Fifth Edition. George E. P. Box, Gwilym M. Jenkins, 
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4. The examination of interrelationships among several related time series variables of 
interest and determination of appropriate multivariate dynamic models to represent 
these joint relationships among the variables over time. 

5. The design of simple control schemes by means of which potential deviations of 
the system output from a desired target may, so far as possible, be compensated by 
adjustment of the input series values. 


1.1 FIVE IMPORTANT PRACTICAL PROBLEMS 

1.1.1 Forecasting Time Series 

The use at time t of available observations from a time series to forecast its value at some 
future time t + l can provide a basis for (1) economic and business planning, (2) production 
planning, (3) inventory and production control, and (4) control and optimization of industrial 
processes. As originally described by Holt et al. (1963), Brown (1962), and the Imperial 
Chemical Industries (ICI) monograph on short term forecasting (Coutie, 1964), forecasts 
are usually needed over a period known as the lead time , which varies with each problem. 
For example, the lead time in the inventory control problem was defined by Harrison (1965) 
as a period that begins when an order to replenish stock is placed with the factory and lasts 
until the order is delivered into stock. 

We will assume that observations are available at discrete, equispaced intervals of 
time. For example, in a sales forecasting problem, the sales z t in the current month t and 
the sales z t _ 1 , z,_ 2 , z r _ 3 ,... in previous months might be used to forecast sales for lead 
times l = 1,2,3,..., 12 months ahead. Denote by z t (l) the forecast made at origin t of 
the sales z r+/ at some future time t + l, that is, at lead time l. The function z t (I), which 
provides the forecasts at origin t for all future lead times, based on the available information 
from the current and previous values z t , z t _ j, z t _ 2 , z t _ 3 ,... through time t, will be called the 
forecast function at origin 1. Our objective is to obtain a forecast function such that the mean 
square of the deviations z t+t — z,(l) between the actual and forecasted values is as small as 
possible for each lead time I. 

In addition to calculating the best forecasts, it is also necessary to specify their accuracy, 
so that, for example, the risks associated with decisions based upon the forecasts may 
be calculated. The accuracy of the forecasts may be expressed by calculating probability 
limits on either side of each forecast. These limits may be calculated for any convenient 
set of probabilities, for example, 50 and 95%. They are such that the realized value of the 
time series, when it eventually occurs, will be included within these limits with the stated 
probability. To illustrate. Figure 1.1 shows the last 20 values of a time series culminating at 
time t. Also shown are forecasts made from origin t for lead times / = 1,2,..., 13, together 
with the 50% probability limits. 

Methods for obtaining forecasts and estimating probability limits are discussed in detail 
in Chapter 5. These forecasting methods are developed based on the assumption that the 
time series z t follows a stochastic model of known form. Consequently, in Chapters 3 
and 4 a useful class of such time series models that might be appropriate to represent the 
behavior of a series z t , called autoregressive integrated moving average (ARIMA) models, 
are introduced and many of their properties are studied. Subsequently, in Chapters 6, 7, 
and 8 the practical matter of how these models may be developed for actual time series data 
is explored, and the methods are described through the three-stage procedure of tentative 
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FIGURE 1.1 Values of a time series with forecast function and 50% probability limits. 


model identification or specification, estimation of model parameters, and model checking 
and diagnostics. 


1.1.2 Estimation of Transfer Functions 

A topic of considerable industrial interest is the study of process dynamics discussed, for 
example, by Astrom and Bohlin (1966, pp. 96-111) and Hutchinson and Shelton (1967). 
Such a study is made (1) to achieve better control of existing plants and (2) to improve the 
design of new plants. In particular, several methods have been proposed for estimating the 
transfer function of plant units from process records consisting of an input time series X t 
and an output time series Y r Sections of such records are shown in Figure 1.2, where the 
input X, is the rate of air supply and the output Y t is the concentration of carbon dioxide 
produced in a furnace. The observations were made at 9-second intervals. A hypothetical 
impulse response function Vj,j = 0,1,2,..., which determines the transfer function for the 
system through a dynamic linear relationship between input X t and output Y r of the form 
Y t = VjX t _j, is also shown in the figure as a bar chart. Transfer function models that 
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FIGURE 1.2 Input and output time series in relation to a dynamic system. 
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relate an input process X, to an output process Y r are introduced in Chapter 11 and many 
of their properties are examined. 

Methods for estimating transfer function models based on deterministic perturbations of 
the input, such as step, pulse, and sinusoidal changes, have not always been successful. This 
is because, for perturbations of a magnitude that are relevant and tolerable, the response 
of the system may be masked by uncontrollable disturbances referred to collectively as 
noise. Statistical methods for estimating transfer function models that make allowance for 
noise in the system are described in Chapter 12. The estimation of dynamic response is of 
considerable interest in economics, engineering, biology, and many other fields. 

Another important application of transfer function models is in forecasting. If, for 
example, the dynamic relationship between two time series Y t and X t can be determined, 
past values of both series may be used in forecasting Y t . In some situations, this approach 
can lead to a considerable reduction in the errors of the forecasts. 

1.1.3 Analysis of Effects of Unusual Intervention Events to a System 

In some situations, it may be known that certain exceptional external events, intervention 
events , could have affected the time series z, under study. Examples of such interven¬ 
tion events include the incorporation of new environmental regulations, economic policy 
changes, strikes, and special promotion campaigns. Under such circumstances, we may 
use transfer function models, as discussed in Section 1.1.2, to account for the effects of 
the intervention event on the series z t , but where the “input” series will be in the form 
of a simple indicator variable taking only the values 1 and 0 to indicate (qualitatively) the 
presence or absence of the event. 

In these cases, the intervention analysis is undertaken to obtain a quantitative measure 
of the impact of the intervention event on the time series of interest. For example. Box 
and Tiao (1975) used intervention models to study and quantify the impact of air pollution 
controls on smog-producing oxidant levels in the Los Angeles area and of economic 
controls on the consumer price index in the United States. Alternatively, the intervention 
analysis may be undertaken to adjust for any unusual values in the series z t that might 
have resulted as a consequence of the intervention event. This will ensure that the results 
of the time series analysis of the series, such as the structure of the fitted model, estimates 
of model parameters, and forecasts of future values, are not seriously distorted by the 
influence of these unusual values. Models for intervention analysis and their use, together 
with consideration of the related topic of detection of outlying or unusual values in a time 
series, are presented in Chapter 13. 

1.1.4 Analysis of Multivariate Time Series 

For many problems in business, economics, engineering, and physical and environmental 
sciences, time series data may be available on several related variables of interest. A more 
informative and effective analysis is often possible by considering individual series as 
components of a multivariate or vector time series and analyzing the series jointly. For 
fc-related time series variables of interest in a dynamic system, we may denote the series as 
z lr , z 2t , ..., z kt , and let Z t = (zy,..., z kt )' denote the 1x1 time series vector at time t. 

Methods of multivariate time series analysis are used to study the dynamic relationships 
among the several time series that comprise the vector Z t . This involves the development 
of statistical models and methods of analysis that adequately describe the interrelationships 
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among the series. Two main purposes for analyzing and modeling the vector of time series 
jointly are to gain an understanding of the dynamic relationships over time among the 
series and to improve accuracy of forecasts for individual series by utilizing the additional 
information available from the related series in the forecasts for each series. Multivariate 
time series models and methods for analysis and forecasting of multivariate series based 
on these models are considered in Chapter 14. 

1.1.5 Discrete Control Systems 

In the past, to the statistician, the words “process control” have usually meant the quality 
control techniques developed originally by Shewhart (1931) in the United States (see 
also Dudding and Jennet, 1942). Later on, the sequential aspects of quality control were 
emphasized, leading to the introduction of cumulative sum charts by Page (1957,1961) and 
Barnard (1959) and the geometric moving average charts of Roberts (1959). Such basic 
charts are frequently employed in industries concerned with the manufacture of discrete 
“parts” as one aspect of what is called statistical process control (SPC). In particular (see 
Deming, 1986), they are used for continuous monitoring of a process. That is, they are used 
to supply a continuous screening mechanism for detecting assignable (or special) causes 
of variation. Appropriate display of plant data ensures that significant changes are quickly 
brought to the attention of those responsible for running the process. Knowing the answer to 
the question ‘ ‘ when did a change of this particular kind occur?’ ’ we may be able to answer 
the question “why did it occur?” Hence a continuous incentive for process stabilization 
and improvement can be achieved. 

By contrast, in the process and chemical industries, various forms of feedback and 
feedforward adjustment have been used in what we will call engineering process control 
(EPC). Because the adjustments made by engineering process control are usually computed 
and applied automatically, this type of control is sometimes called automatic process 
control (APC). However, the manner in which these adjustments are made is a matter of 
convenience. This type of control is necessary when there are inherent disturbances or 
noise in the system inputs that are impossible or impractical to remove. When we can 
measure fluctuations in an input variable that can be observed but not changed, it may 
be possible to make appropriate compensatory changes in some other control variable. 
This is referred to as feedforward control. Alternatively, or in addition, we may be able 
to use the deviation from target or “error signal” of the output characteristic itself to 
calculate appropriate compensatory changes in the control variable. This is called feedback 
control. Unlike feedforward control, this mode of correction can be employed even when 
the source of the disturbances is not accurately known or the magnitude of the disturbance 
is not measured. 

In Chapter 15, we draw on the earlier discussions in this book, on time series and 
transfer function models, to provide insight into the statistical aspects of these control 
methods and to appreciate better their relationships and different objectives. In particular, 
we show how some of the ideas of feedback control can be used to design simple charts 
for manually adjusting processes. For example, the upper chart of Figure 1.3 shows hourly 
measurements of the viscosity of a polymer made over a period of 42 hours. The viscosity 
is to be controlled about a target value of 90 units. As each viscosity measurement comes 
to hand, the process operator uses the nomogram shown in the middle of the figure to 
compute the adjustment to be made in the manipulated variable (gas rate). The lower chart 
of Figure 1.3 shows the adjustments made in accordance with the nomogram. 
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FIGURE 1.3 Control of viscosity. Record of observed viscosity and of adjustments in gas rate 
made using nomogram. 


1.2 STOCHASTIC AND DETERMINISTIC DYNAMIC 
MATHEMATICAL MODELS 

The idea of using a mathematical model to describe the behavior of a physical phenomenon 
is well established. In particular, it is sometimes possible to derive a model based on 
physical laws, which enables us to calculate the value of some time-dependent quantity 
nearly exactly at any instant of time. Thus, we might calculate the trajectory of a missile 
launched in a known direction with known velocity. If exact calculation were possible, 
such a model would be entirely deterministic. 

Probably no phenomenon is totally deterministic, however, because unknown factors 
can occur such as a variable wind velocity that can throw a missile slightly off course. In 
many problems, we have to consider a time-dependent phenomenon, such as monthly sales 
of newsprint, in which there are many unknown factors and for which it is not possible 
to write a deterministic model that allows exact calculation of the future behavior of the 
phenomenon. Nevertheless, it may be possible to derive a model that can be used to calculate 
the probability of a future value lying between two specified limits. Such a model is called 
a probability model or a stochastic model. The models for time series that are needed, 
for example, to achieve optimal forecasting and control, are in fact stochastic models. It 
is necessary in what follows to distinguish between the probability model or stochastic 
process, as it is sometimes called, and the actually observed time series. Thus, a time series 
z j, z 2 ,..., z N of N successive observations is regarded as a sample realization from an 
infinite population of such time series that could have been generated by the stochastic 



STOCHASTIC AND DETERMINISTIC DYNAMIC MATHEMATICAL MODELS 


7 


process. Very often we will omit the word “stochastic” from “stochastic process” and 
talk about the “process.” 


1.2.1 Stationary and Nonstationary Stochastic Models for Forecasting and Control 

An important class of stochastic models for describing time series, which has received a 
great deal of attention, comprises what are called stationary models. Stationary models 
assume that the process remains in statistical equilibrium with probabilistic properties 
that do not change over time, in particular varying about a fixed constant mean level 
and with constant variance. However, forecasting has been of particular importance in 
industry, business, and economics, where many time series are often better represented as 
nonstationary and, in particular, as having no natural constant mean level over time. It is not 
surprising, therefore, that many of the economic forecasting methods originally proposed 
by Holt (1957, 1963), Winters (1960), Brown (1962), and the ICI monographs (Coutie, 
1964) that used exponentially weighted moving averages can be shown to be appropriate 
for a particular type of nonstationary process. Although such methods are too narrow to 
deal efficiently with all time series, the fact that they often give the right kind of forecast 
function supplies a clue to the kind of nonstationary model that might be useful in these 
problems. 

The stochastic model for which the exponentially weighted moving average forecast 
yields minimum mean square error (Muth, 1960) is a member of a class of nonstationary 
processes called autoregressive integrated moving average processes, which are discussed 
in Chapter 4. This wider class of processes provides a range of models, stationary and 
nonstationary, that adequately represent many of the time series met in practice. Our 
approach to forecasting has been first to derive an adequate stochastic model for the 
particular time series under study. As shown in Chapter 5, once an appropriate model has 
been determined for the series, the optimal forecasting procedure follows immediately. 
These forecasting procedures include the exponentially weighted moving average forecast 
as a special case. 


Some Simple Operators. We employ extensively the backward shift operator B, which 
is defined by Bz t = z. t _ l ; hence B m z t = z t _ m . The inverse operation is performed by 
the forward shift operator F = B~ l given by Fz. t = z t+] ; hence F m z. t = z t+m . Another 
important operator is the backward difference operator, V, defined by Vz, = z t — z t _ [ . 
This can be written in terms of B, since 

Vz ( = z, - z,_ i = (1 - B)z, 

Linear Filter Model. The stochastic models we employ are based on an idea originally 
due to Yule (1927) that an observable time series z t in which successive values are highly 
dependent can frequently be regarded as generated from a series of independent “shocks” 
a t . These shocks are random drawings from a fixed distribution, usually assumed normal 
and having mean zero and variance Such a sequence of independent random variables 
a t , a,_ i, a t _ 2 , ... is called a white noise process. 

The white noise process a t is supposed transformed to the process z r by what is called a 
linear filter, as shown in Figure 1.4. The linear filtering operation simply takes a weighted 
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FIGURE 1.4 Representation of a time series as the output from a linear filter. 


sum of previous random shocks a t , so that 


z, = p + a, + W\ a t-\ + ¥i a t -2 + •" 

= p + \p(E)a t ( 1 . 2 . 1 ) 

In general, p is a parameter that determines the “level” of the process, and 

y/(B) = \ + \jj x B + y/ 2 B 2 + ■■■ 

is the linear operator that transforms a, into z. t and is called the transfer function of the filter. 
The model representation (1.2.1) can allow for a flexible range of patterns of dependence 
among values of the process { z,} expressed in terms of the independent (unobservable) 
random shocks a t . 

The sequence ip x , i// 2 ,... formed by the weights may, theoretically, be finite or infinite. If 
this sequence is finite, or infinite and absolutely summable in the sense that 0 Wj I < 00 > 
the filter is said to be stable and the process z t is stationary. The parameter p is then the 
mean about which the process varies. Otherwise, z t is nonstationary and p has no specific 
meaning except as a reference point for the level of the process. 

Autoregressive Models. A stochastic model that can be extremely useful in the represen¬ 
tation of certain practically occurring series is the autoregressive model. In this model, the 
current value of the process is expressed as a finite, linear aggregate of previous values 
of the process and a random shock a t . Let us denote the values of a process at equally 
spaced times f, t — 1, t — 2, ... by z f , z t _ x , z t _ 2 , .... Also let z t = z, — p be the series of 
deviations from p. Then 

z, = f\Z t _\ + 4 > 2 z t-2 + + $p z t-p + a t (1.2.2) 

is called an autoregressive (AR) process of order p. The reason for this name is that a linear 
model 


Z. = (/qXj + (j) 2*2 + "■ + QpXp + a 

relating a “dependent” variable z to a set of “independent” variables x x , x 2 ,..., x p , plus 
a random error term a , is referred to as a regression model, and z is said to be ‘ ‘regressed’ ’ 
on x 1 ,x 2 ,... ,x p . In (1.2.2) the variable z is regressed on previous values of itself; hence 
the model is autoregressive. If we define an autoregressive operator of order p in terms of 
the backward shift operator B by 

<KB)= 1 -4>iB -f 2 B 2 - f p B p 

the autoregressive model (1.2.2) may be written economically as 


4>(B)z t = a, 
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The model contains p + 2 unknown parameters p, <p \, <p 2 ,, <p /r ft, which in practice 
have to be estimated from the data. The additional parameter ft is the variance of the white 
noise process a r 

It is not difficult to see that the autoregressive model is a special case of the linear filter 
model of (1.2.1). For example, we can eliminate z t _ j from the right-hand side of (1.2.2) by 
substituting 


z r -l - $i^_2 + 02^-3 + + Qph-p -1 + a t -1 

Similarly, we can substitute for z t _ 2 , and so on, to yield eventually an infinite series in 
the c/’s. Consider, specifically, the simple first-order (p = 1) AR process, z t = + a r 

After m successive substitutions of z t _j = (pz t _j_ x + a t _j, j = 1,... ,m in the right-hand 
side we obtain 

z r = (p' n+l z t _ m _ { +a, + </>«,_! + c fa,_ 2 + - + 4> m a t _ m 

In the limit as m -> oo this leads to the convergent infinite series representation z t = 
Yi%o ft a t-j Vj = ft , j > 1, provided that \4>\ < 1. Symbolically, in the general AR 

case we have that 


4>(B)z t = a, 


is equivalent to 


z, = cf> 1 (B)a t = i j/{B)a t 
with y/(B) = 4>~ l (B) = S°l 0 y/jB j . 

Autoregressive processes can be stationary or nonstationary. For the process to be 
stationary, the (p's must be such that the weights i//,, i// 2 ,... in i//( B) = (jr 1 (6) form a 
convergent series. The necessary requirement for stationarity is that the autoregressive 
operator, <p(B) = 1 — (/;, B — (piB 2 — — <p [l B p , considered as a polynomial in B of degree 

p, must have all roots of cp( B ) = 0 greater than 1 in absolute value; that is, all roots must 
lie outside the unit circle. For the first-order AR process z t = (pz l _ l + a t this condition 
reduces to the requirement that \ <p\ < 1, as the argument above has already indicated. 

Moving Average Models. The autoregressive model (1.2.2) expresses the deviation z t of 
the process as a finite weighted sum of p previous deviations z ( _i, z t _ 2 , ..., z t _ p of the 
process, plus a random shock a t . Equivalently, as we have just seen, it expresses z t as an 
infinite weighted sum of the a’s. 

Another kind of model, of great practical importance in the representation of observed 
time series, is the finite moving average process. Here we take z t , linearly dependent on a 
finite number q of previous a’s. Thus, 

Z,=a,~ Vr-l - ®2 a t-2 - e q a t-q (1.2.3) 

is called a moving average (MA) process of order q. The name “moving average” is 
somewhat misleading because the weights 1, —9 l , —0 2 ,.... —9 q , which multiply the a’s, 
need not total unity nor need they be positive. However, this nomenclature is in common 
use, and therefore we employ it. 
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If we define a moving average operator of order q by 

0(B) = \-0 x B-0 2 B 2 - 0 q B q 

the moving average model may be written economically as 

z t = 0(B)a t 

It contains q + 2 unknown parameters p, 9 l ,...,0 q , a 2 , which in practice have to be 
estimated from the data. 

Mixed Autoregressive-Moving Average Models. To achieve greater flexibility in fitting 
of actual time series, it is sometimes advantageous to include both autoregressive and 
moving average terms in the model. This leads to the mixed autoregressive-moving average 
(ARMA) model: 


Z, = <Mf_t + ■" + Qph-p + a t ~ d l a ,-\ - °q a t-q (1-2.4) 


or 


< p(B)z t = 9(B)a t 

The model employs p + q + 2 unknown parameters p, ip l ,..., <fi p , 9 l ,..., 6 q , a 2 , that are 
estimated from the data. This model may also be written in the form of the linear filter (1.2.1) 
asz, = 4>~ l (B)0(B)a t = ip(B)a t ,withy/(B) = 4>~ i (B)9(B). In practice, it is frequently true 
that an adequate representation of actually occurring stationary time series can be obtained 
with autoregressive, moving average, or mixed models, in which p and q are not greater 
than 2 and often less than 2. We discuss the classes of autoregressive, moving average, and 
mixed models in much greater detail in Chapters 3 and 4. 

Nonstationary Models. Many series actually encountered in industry or business (e.g., 
stock prices and sales figures) exhibit nonstationary behavior and in particular do not vary 
about a fixed mean. Such series may nevertheless exhibit homogeneous behavior over time 
of a kind. In particular, although the general level about which fluctuations are occurring 
may be different at different times, the broad behavior of the series, when differences in 
level are allowed for, may be similar over time. We show in Chapter 4 and later chapters that 
such behavior may often be represented by a model in terms of a generalized autoregressive 
operator cp(B), in which one or more of the zeros of the polynomial (pi B j [i.e., one or more 
of the roots of the equation cp(B) = 0] lie on the unit circle. In particular, if there are d unit 
roots and all other roots lie outside the unit circle, the operator cp(B) can be written 

<p(B) = 4>(B)( 1 - B) d 

where <p(B) is a stationary autoregressive operator. Thus, a model that can represent 
homogeneous nonstationary behavior is of the form 

cp(B)z t = 4>(B)(1 - B) d z t = 0(B)a, 


that is, 


< p(B)w, = 9(B)a t 


( 1 . 2 . 5 ) 
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where 


w, = (1 - B) d z, = V d z, (1.2.6) 

Thus, homogeneous nonstationary behavior can sometimes be represented by a model that 
calls for the dth difference of the process to be stationary. In practice, d is usually 0, 1, or 
at most 2, with d = 0 corresponding to stationary behavior. 

The process defined by (1.2.5) and (1.2.6) provides a powerful model for describing 
stationary and nonstationary time series and is called an autoregressive integrated moving 
average process, of order (p , d, q), or ARIMA(p, d, q) process. The process is defined by 

w, = 4) x w t _ x + <p p w,_ p + a t - 0 x a t _ x - 0 q a t - q (1.2.7) 

with w t = S7 d z t . Note that if we replace w t , by z t — p, when d = 0, the model (1.2.7) in¬ 
cludes the stationary mixed model (1.2.4), as a special case, and also the pure autoregressive 
model (1.2.2) and the pure moving average model (1.2.3). 

The reason for inclusion of the word “integrated” (which should perhaps more ap¬ 
propriately be “summed”) in the ARIMA title is as follows. The relationship, which is 
the inverse to (1.2.6), is z, = S d w t , where A = V -1 =(1 — B)~ l = 1 + B + B 2 + is the 
summation operator defined by 


Sw t = ^ w t _j = w t + w t _j + w t _ 2 + 
j =0 

Thus, the general ARIMA process may be generated by summing or “integrating” the 
stationary ARMA process w t d times. In Chapter 9, we describe how a special form of the 
model (1.2.7) can be employed to represent seasonal time series. The chapter also includes 
a discussion of regression models where the errors are autocorrelated and follow an ARMA 
process. 

Chapter 10 includes material that may be considered more specialized and that either 
supplements or extends the material presented in the earlier chapters. The chapter begins 
with a discussion of unit root testing that may be used as a supplementary tool to determine 
if a time series is nonstationary and can be made stationary through differencing. This 
is followed by a discussion of conditionally heteroscedastic models such as the ARCH 
and GARCH models. These models assume that the conditional variance of an observation 
given its past vary over time and are useful for modeling time varying volatility in economic 
and financial time series, in particular. In Chapter 10, we also discuss nonlinear time series 
models and fractionally integrated long-memory processes that allow for certain more 
general features in a time series than are possible using the linear ARIMA models. 


1.2.2 Transfer Function Models 

An important type of dynamic relationship between a continuous input and a continuous 
output, for which many physical examples can be found, is that in which the deviations of 
input X and output Y , from appropriate mean values, are related by a linear differential 
equation. In a similar way, for discrete data, in Chapter 11 we represent the transfer 
relationship between an output Y and an input X, each measured at equispaced times, by 
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the difference equation 


(1 + fcV + - + ZX)Y t = (i/o + »/i V + ... + r,^ s )X t _ b (1.2.8) 

in which the differential operator £> = d/dt is replaced by the difference operator V = 
1 — B. An expression of the form (1.2.8), containing only a few parameters (/• <2 ,s< 2), 
may often be used as an approximation to a dynamic relationship whose true nature is more 
complex. 

The linear model (1.2.8) may be written equivalently in terms of past values of the input 
and output by substituting B = 1 — V in (1.2.8), that is, 

(1 - <5, B - 8 r B r )Y t = (m 0 - co x B - co s B s )X,_ b 

= ( co 0 B b - co l B b+l - co s B h+s )X, (1.2.9) 


or 


8(B)Y t = co(B)B b X, = Q(B)X t 

Alternatively, we can say that the output Y t and the input X t are linked by a linear filter 


Y, — v 0 X, + v x X t _ { + v 2 X t _ 2 + ••• 

= v(B)X t (1.2.10) 

for which the transfer function 

v(B) = u 0 + iqD + v 2 B 2 + ■■■ (1.2.11) 


can be expressed as a ratio of two polynomial operators, 

Q(_B) , 

V(B) = 5(B) = 3 (B)Q(B) 

The linear filter (1.2.10) is said to be stable if the series (1.2.11) converges for | B\ 
< 1, equivalently, if the coefficients {Vj} are absolutely summable, I v j I < °°- The 
sequence of weights u 0 , v ,, v 2 ,..., which appear in the transfer function (1.2.11), is called 
the impulse response function. We note that for the model (1.2.9), the first b weights 
v 0 , Vi, ..., |, are zero. A hypothetical impulse response function for the system of 

Figure 1.2 is shown in the center of that diagram. 

Models with Superimposed Noise. We have seen that the problem of estimating an appro¬ 
priate model, linking an output Y t and an input X n is equivalent to estimating the transfer 
function v(B) = 5~ l (B)Q.(B), for example, specifying the parametric form of the transfer 
function v(B) and estimating its parameters. However, this problem is complicated in prac¬ 
tice by the presence of noise N t , which we assume corrupts the true relationship between 
input and output according to 


Y t = v(B)X t + N t 

where N t and X t are independent processes. Suppose, as indicated by Figure 1.5, that the 
noise N t can be described by a stationary or nonstationary stochastic model of the form 



STOCHASTIC AND DETERMINISTIC DYNAMIC MATHEMATICAL MODELS 13 


KB) - 


r 

Linear 

filter 





Noise JV, 

i _ 

l 

■ (B) = S~'(B)a(B) 

X ‘ r 

Linear 

dynamic 1 

system 

1 

v(B)X, , 




FIGURE 1.5 Transfer function model for dynamic system with superimposed noise model. 
(1.2.5) or (1.2.7). that is, 

N t = = cp~ l (B)0(B)a t 

Then the observed relationship between output and input will be 
Y t = v(B)X, + y/(B)a t 

= 8-\B)Q.(B)X, + <p-\B)d(B)a t (1.2.12) 

In practice, it is necessary to estimate the transfer function 

V(B) = cp-\B)6{B) 

of the linear filter describing the noise, in addition to the transfer function v(B) = 
8~ l (B)Q.(B), which describes the dynamic relationship between the input and the 
output. Methods for doing this are discussed in Chapter 12. 

1.2.3 Models for Discrete Control Systems 

As stated in Section 1.1.5, control is an attempt to compensate for disturbances that infect 
a system. Some of these disturbances are measurable; others are not measurable and only 
manifest themselves as unexplained deviations from the target of the characteristic to be 
controlled. To illustrate the general principles involved, consider the special case where 
unmeasured disturbances affect the output Y r of a system, and suppose that feedback control 
is employed to bring the output as close as possible to the desired target value by adjustments 
applied to an input variable X t . This is illustrated in Figure 1.6. Suppose that N t represents 
the effect at the output of various unidentified disturbances within the system, which in the 
absence of control could cause the output to drift away from the desired target value or set 
point T. Then, despite adjustments that have been made to the process, an error 

e t = Y,-T 

= v(B)X, + N, — T 

will occur between the output and its target value T. The object is to choose a control 
equation so that the errors e have the smallest possible mean square. The control equation 
expresses the adjustment x t = X, — X t _ x to be taken at time t, as a function of the present 
deviation e t , previous deviations e f _ 1 ,e ( _ 2 ,..., and previous adjustments x t _ i ,x l _ 2 ,.... 
The mechanism (human, electrical, pneumatic, or electronic) that carries out the control 
action called for by the control equation is called the controller. 
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i(B) = V -\B)8(B) 



FIGURE 1.6 Feedback control scheme to compensate an unmeasured disturbance N r 


One procedure for designing a controller is equivalent to forecasting the deviation from 
target which would occur if no control were applied, and then calculating the adjustment 
that would be necessary to cancel out this deviation. It follows that the forecasting and 
control problems are closely linked. In particular, if a minimum mean square error forecast 
is used, the controller will produce minimum mean square error control. To forecast the 
deviation from target that could occur if no control were applied, it is necessary to build a 
model 


N t = \//(B)a l = cp l (B)9(B)a t 

for the disturbance. Calculation of the adjustment x t that needs to be applied to the input 
at time t to cancel out a predicted change at the output requires the building of a dynamic 
model with transfer function 


v(B) = S~\B)£l(B) 

which links the input with output. The resulting adjustment x t will consist, in general, of a 
linear aggregate of previous adjustments and current and previous control errors. Thus the 
control equation will be of the form 

x , = Cl*?-1 + %2 x t-2 + ■" + Xo e t + Xl E t-\ + X 2 £ t-2 + •" (1.2.13) 

where £j, C 2 ,..., To- Xi^Xi »••• are constants. 

It turns out that, in practice, minimum mean square error control sometimes results in 
unacceptably large adjustments x t to the input variable. Consequently, modified control 
schemes are employed that restrict the amount of variation in the adjustments. Some of 
these issues are discussed in Chapter 15. 


1.3 BASIC IDEAS IN MODEL BUILDING 
1.3.1 Parsimony 

We have seen that the mathematical models we need to employ contain certain constants or 
parameters whose values must be estimated from the data. It is important, in practice, that 
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we employ the smallest possible number of parameters for adequate representations. The 
central role played by this principle of parsimony (Tukey, 1961) in the use of parameters 
will become clearer as we proceed. As a preliminary illustration, we consider the following 
simple example. 

Suppose we fitted a dynamic model (1.2.9) of the form 

Y t = (o> 0 - ®i B - co 2 B 2 — - co s B s )X, (1.3.1) 

when dealing with a system that was adequately represented by 

(1 - 8B)Y t = co 0 X t (1.3.2) 

The model (1.3.2) contains only two parameters, 8 and co 0 , but for s sufficiently large, it 
could be represented approximately by the model (1.3.1), through 

Y t = ( 1 - SBV'coqX, = ® 0 (1 +8B + 8 2 B 2 + -)X t 

with 1 8 1 < 1. Because of experimental error, we could easily fail to recognize the rela¬ 
tionship between the coefficients in the fitted equation. Thus, we might needlessly fit a 
relationship like (1.3.1), containing s + 1 parameters, where the much simpler form (1.3.2), 
containing only two, would have been adequate. This could, for example, lead to unneces¬ 
sarily poor estimation of the output Y t for given values of the input X t , X r _ t ,.... 

Our objective, then, must be to obtain adequate but parsimonious models. Forecasting 
and control procedures could be seriously deficient if these models were either inadequate 
or unnecessarily prodigal in the use of parameters. Care and effort is needed in selecting the 
model. The process of selection is necessarily iterative; that is, it is a process of evolution, 
adaptation, or trial and error and is outlined briefly below. 


1.3.2 Iterative Stages in the Selection of a Model 

If the physical mechanism of a phenomenon were completely understood, it would be 
possible theoretically to write down a mathematical expression that described it exactly. 
This would result in a mechanistic or theoretical model. In most instances the complete 
knowledge or large experimental resources needed to produce a mechanistic model are not 
available, and we must resort to an empirical model. Of course, the exact mechanistic model 
and the exclusively empirical model represent extremes. Models actually employed usually 
lie somewhere in between. In particular, we may use incomplete theoretical knowledge to 
indicate a suitable class of mathematical functions, which will then be fitted empirically 
(e.g., Box and Hunter, 1965); that is, the number of terms needed in the model and the 
numerical values of the parameters are estimated from experimental data. This is the 
approach that we adopt in this book. As we have indicated previously, the stochastic and 
dynamic models we describe can be justified, at least partially, on theoretical grounds as 
having the right general properties. 

It is normally supposed that successive values of the time series under consideration or 
of the input-output data are available for analysis. If possible, at least 50 and preferably 
100 observations or more should be used. In those cases where a past history of 50 or more 
observations is not available, one proceeds by using experience and past information to 
derive a preliminary model. This model may be updated from time to time as more data 
become available. 
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FIGURE 1.7 Stages in the iterative approach to model building. 


In fitting dynamic models, a theoretical analysis can sometimes tell us not only the 
appropriate form for the model, but may also provide us with good estimates of the 
numerical values of its parameters. These values can then be checked later by analysis of 
data. 

Figure 1.7 summarizes the iterative approach to model building for forecasting and 
control, which is employed in this book. 

1. From the interaction of theory and practice, a useful class of models for the purposes 
at hand is considered. 

2. Because this class is too extensive to be conveniently fitted directly to data, rough 
methods for identifying subclasses of these models are developed. Such methods 
of model identification employ data and knowledge of the system to suggest an 
appropriate parsimonious subclass of models that may be tentatively entertained. In 
addition, the identification process can be used to yield rough preliminary estimates 
of the parameters in the model. 

3. The tentatively entertained model is fitted to data and its parameters estimated. The 
rough estimates obtained during the identification stage can now be used as starting 
values in more refined iterative methods for estimating the parameters, such as the 
nonlinear least squares and maximum likelihood methods. 

4. Diagnostic checks are applied with the goal of uncovering possible lack of fit and 
diagnosing the cause. If no lack of fit is indicated, the model is ready to use. If any 
inadequacy is found, the iterative cycle of identification, estimation, and diagnostic 
checking is repeated until a suitable representation is found. 
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Identification, estimation, and diagnostic checking are discussed for univariate time 
series models in Chapters 6, 7, 8, and 9, for transfer function models in Chapter 12, for 
intervention models in Chapter 13, and for multivariate time series models in Chapter 14. 

The model building procedures will be illustrated using actual time series with numerical 
calculations performed using the R software and other tools. A brief description of the R 
software is included in Appendix Al.l along with references for further study. Exercises 
at the end of the chapters also make use of the software. 


APPENDIX Al.l USE OF THE R SOFTWARE 

The R software for statistical computing and graphics is a common choice for data analysis 
and development of new statistical methods. R is available as Free Software under the terms 
of the Free Software Foundations’s GNU General Public License in source code form. It 
compiles and runs on all common operating systems including Windows, MacOS X, and 
Linux. The main website for the R project is http://www.r-project.org. 

The R environment consists of a base system, which is developed and maintained by the 
R Core Team, and a large set of user contributed packages. The base system provides the 
source code that implements the basic functionality of R. It also provides a set of standard 
packages that include commonly used probability distributions, graphical tools, classic 
datasets from the literature, and a set of statistical methods that include regression analysis 
and time series analysis. In addition to these base packages, there are now thousands of 
contributed packages developed by researchers around the world. Packages useful for time 
series modeling and forecasting include the stats package that is part of the base distribution 
and several contributed packages that are available for download. These include the TSA 
package by K-S Chan and Brian Ripley, the astsa package by David Stoffer, the Rmetrics 
packages fGarch and fUnitRoots for financial time series analysis by Diethelm Wuertz 
and associates, and the MTS package for multivariate time series analysis by Ruey Tsay. 
We use many functions from these packages in this book. We also use datasets available 
for download from the R datasets package, and the TSA and astsa packages. 

Both the base system and the contributed packages are distributed through a network 
of servers called the Comprehensive R Archive Network (CRAN) that can be accessed 
from the official R website. Contributed packages that are not part of the base distribution 
can be installed directly from the R prompt “>” using the command install.package(). 
Under the Windows system, the installation can also be done from a drop-down list. The 
command will prompt the user to select a CRAN Mirror , after which a list of packages 
available for installation appears. To use a specific package, it also needs to be loaded into 
the system at the start of each session. For example, the TSA package can be loaded using 
the commands library(TSA) or require(TSA). The command data() will list all datasets 
available in the loaded packages. The command data(airquality) will load the dataset 
airquality from the R datasets package into memory. Data stored in a text file can be read 
into R using the command is read.table. For a .csv file, the command is read. CSV. To get 
help on specific functions, e.g. the arima function which fits an ARIMA model to a time 
series, type help(arima) or ?arima. 

R is object-oriented software and allows the user to create many objects. For example, 
the command ts() will create a time series object. This has advantages for plotting the time 
series and for certain other applications. However, it is not necessary to create a time series 
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object for many of the applications discussed in this book. The structure of the data in R 
can be examined using commands such as class(), str(), and summary(). 

The data used for illustration in this book, as well as in some of the exercises, include 
a set of time series listed in Part Five of the book. These series are also available at 
http://pages.cs.wisc.edu/ reinsel/bjr-data/index.html. At least three of the series are also 
included in the R datasets package and can be accessed using the dataQ command 
described above. Some of the exercises require the use of R and it will be assumed 
that the reader is already familiar with the basics of R, which can be obtained by working 
through relevant chapters of texts such as Crawley (2007) and Adler (2010). Comprehensive 
documentation in the form of manuals, contributed documents, online help pages, and FAQ 
sheets is also available on the R website. Since R builds on the S language, a useful reference 
book is also Venables and Ripley (2002). 


EXERCISES 

1.1. The dataset airquality in the R datasets package includes information on daily air 
quality measurements in New York, May to September 1973. The variables included 
are mean ozone levels at Roosevelt Island, solar radiation at Central Park, average wind 
speed at LaGuardia Airport, and maximum daily temperature at LaGuardia Airport; 
see help(airquality) for details. 

(a) Load the dataset into R. 

(b) Investigate the structure of the dataset. 

(c) Plot each of the four series mentioned above using the plotQ command in R; see 
help(plot) for details and examples. 

(d) Comment on the behavior of the four series. Do you see any issues that may 
require special attention in developing a time series model for each of the four 
series. 

1.2. Monthly totals of international airline passengers (in thousands of passengers), January 
1949-December 1960, are available as Series G in Part Five of this book. The data 
are also available as series AirPassengers in the R datasets package. 

(a) Load the dataset into R and examine the structure of the data. 

(b) Plot the data using R and describe the behavior of the series. 

(c) Perform a log transformation of the data and plot the resulting series. Compare 
the behavior of the original and log-transformed series. Do you see an advantage 
in using a log transformation for modeling purposes? 

1.3. Download a time series of your choosing from the Internet. Note that financial and 
economic time series are available from sources such as Google Finance and the Fed¬ 
eral Reserve Economic Data (FRED) of Federal Reserve Bank in St. Louis, Missouri, 
while climate data is available from from NOAA’s National Climatic Data Center 
(NCDC). 

(a) Store the data in a text file or a .csv file and read the data into R. 

(b) Examine the properties of your series using plots or other appropriate tools. 

(c) Does your time series appear to be stationary? If not, would differencing and/or 
some other transformation make the series stationary? 


PART ONE 


STOCHASTIC MODELS AND THEIR 
FORECASTING 


In the first part of this book, which includes Chapters 2, 3, 4, and 5, a valuable class of 
stochastic models is described and its use in forecasting discussed. 

A model that describes the probability structure of a sequence of observations is called 
a stochastic process. A time series of N successive observations t! = (z x , z 2 ,, z N ) is 
regarded as a sample realization, from an infinite population of such samples, which could 
have been generated by the process. A major objective of statistical investigation is to infer 
properties of the population from those of the sample. For example, to make a forecast is to 
infer the probability distribution of a future observation from the population, given a sample 
z of past values. To do this, we need ways of describing stochastic processes and time series, 
and we also need classes of stochastic models that are capable of describing practically 
occurring situations. An important class of stochastic processes discussed in Chapter 2 is the 
stationary processes. They are assumed to be in a specific form of statistical equilibrium, 
and in particular, vary over time in a stable manner about a fixed mean. Useful devices 
for describing the behavior of stationary processes are the autocorrelation function and the 
spectrum. 

Particular stationary stochastic processes of value in modeling time series are the autore¬ 
gressive (AR), moving average (MA), and mixed autoregressive-moving average (ARMA) 
processes. The properties of these processes, in particular their autocorrelation structures, 
are described in Chapter 3. 

Because many practically occurring time series (e.g., stock prices and sales figures) have 
nonstationary characteristics, the stationary models introduced in Chapter 3 are developed 
further in Chapter 4 to give a useful class of nonstationary processes called autoregressive 
integrated moving-average (ARIMA) models. The use of all these models in forecasting 
time series is discussed in Chapter 5 and is illustrated with examples. 
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2 


AUTOCORRELATION FUNCTION AND 
SPECTRUM OF STATIONARY 
PROCESSES 


A central feature in the development of time series models is an assumption of some form 
of statistical equilibrium. A particularly useful assumption of this kind (but an unduly 
restrictive one, as we will see later) is that of stationarity. Usually, a stationary time 
series can be usefully described by its mean, variance, and autocorrelation function or 
equivalently by its mean, variance, and spectral density function. In this chapter, we consider 
the properties of these functions and, in particular, the properties of the autocorrelation 
function, which will be used extensively in developing models for actual time series. 


2.1 AUTOCORRELATION PROPERTIES OF STATIONARY MODELS 

2.1.1 Time Series and Stochastic Processes 

Time Series. A time series is a set of observations generated sequentially over time. 
If the set is continuous, the time series is said to be continuous. If the set is discrete, 
the time series is said to be discrete. Thus, the observations from a discrete time series 
made at times t 1 , r 2 ,..., r t ,... ,t n may be denoted by zCzq), z(t 2 ), ... , z(r r ),..., z(t n ). In 
this book, we consider only discrete time series where observations are made at a fixed 
interval h. When we have N successive values of such a series available for analysis, 
we write z lf z 2 ,..., z t , ..., z N to denote observations made at equidistant time intervals 
r 0 + h, Tq + 2 h,... ,t 0 + th,... ,Tq + Nh. For many purposes, the values of r 0 and h are 
unimportant, but if the observation times need to be defined exactly, these two values can 
be specified. If we adopt r 0 as the origin and h as the unit of time, we can regard z t as the 
observation at time t. 
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FIGURE 2.1 Yields of 70 consecutive batches from a chemical process. 


Discrete time series may arise in two ways: 

1. By sampling a continuous time series: For example, in the situation shown in 
Figure 1.2, where the continuous input and output from a gas furnace was sampled at 
intervals of 9 seconds. 

2. By accumulating a variable over a period of time: Examples are rainfall, which is 
usually accumulated over a period such as a day or a month, and the yield from a 
batch process, which is accumulated over the batch time. For example, Figure 2.1 
shows a time series consisting of the yields from 70 consecutive batches of a chemical 
process. The series shown here is included as Series F in Part Five of this book. 

Deterministic and Statistical Time Series. If future values of a time series are exactly 
determined by some mathematical function such as 

z t = cos(2tt / 1) 

the time series is said to be deterministic. If future values can be described only in terms of a 
probability distribution, the time series is said to be nondeterministic or simply a statistical 
time series. The batch data of Figure 2.1 provide an example of a statistical time series. 
Thus, although there is a well-defined high-low pattern in the series, it is impossible to 
forecast the exact yield for the next batch. It is with such statistical time series that we are 
concerned in this book. 

Stochastic Processes. A statistical phenomenon that evolves in time according to proba¬ 
bilistic laws is called a stochastic process. We will often refer to it simply as a process, 
omitting the word “stochastic.” The time series to be analyzed may then be thought of as 
a particular realization, produced by the underlying probability mechanism, of the system 
under study. In other words, in analyzing a time series we regard it as a realization of a 
stochastic process. 
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FIGURE 2.2 Observed time series (thick line), with other time series representing realizations of 
the same stochastic process. 


For example, to analyze the batch data in Figure 2.1, we can imagine other sets of 
observations (other realizations of the underlying stochastic process), which might have 
been generated by the same chemical system, in the same N = 70 batches. Thus, Figure 2.2 
shows the yields from batches r = 21 to r = 30 (thick line), together with other time series 
that might have been obtained from the population of time series defined by the underlying 
stochastic process. It follows that we can regard the observation z. t at a given time t, say 
t = 25, as a realization of a random variable z t with probability density function p{z t ). 
Similarly, the observations at any two times, say 1 1 = 25 and t 2 = 27, may be regarded 
as realizations of two random variables z t and z u with joint probability density function 
p(z t[ ,z tn ). For illustration Figure 2.3 shows contours of constant density for such a joint 
distribution, together with the marginal distribution at time t l . In general, the observations 
making up an equispaced time series can be described by an /V-dimensional random 
variable (z 1 ,z 2 ,... ,z N ) with probability distribution p(z\,z 2 ,... ,z N ). 



*<i-► 


FIGURE 2.3 Contours of constant density of a bivariate probability distribution describing a 
stochastic process at two times together with the marginal distribution at time t l . 
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2.1.2 Stationary Stochastic Processes 

A very special class of stochastic processes, called stationary processes , is based on the 
assumption that the process is in a particular state of statistical equilibrium. A stochastic 
process is said to be strictly stationary if its properties are unaffected by a change of 
time origin, that is, if the joint probability distribution associated with m observations 
z t ,z t ,... ,z t , made at any set of times t { , t 2 ,..., t m , is the same as that associated with 
m observations z t{+k , z tn+k ,... ,z t +k , made at times t l + k,t 2 + k,... ,t m + k. Thus, for a 
discrete process to be strictly stationary, the joint distribution of any set of observations 
must be unaffected by shifting all the times of observation forward or backward by any 
integer amount k. 


Mean and Variance of a Stationary Process. When m = 1, the stationarity assumption 
implies that the probability distribution p(z t ) is the same for all times t and may be written 
as p(z). Hence, the stochastic process has a constant mean 

/ OO 

zp(z)dz (2.1.1) 

OO 

which defines the level about which it fluctuates, and a constant variance 


/ OO 

(z-p) 2 p(z)dz (2.1.2) 

OO 

which measures its spread about this level. Since the probability distribution p(z) is the 
same for all times t, its shape can be inferred by forming the histogram of the observations 
z 1 ,z 2 ,..., z N , making up the observed time series. In addition, the mean p of the stochastic 
process can be estimated by the sample mean 


N 

Z=jfXz, < 2 . 1 . 3 ) 

t=l 

of the time series, and the variance a 1 of the stochastic process can be estimated by the 
sample variance 




N 


=\2 


t= 1 


(2.1.4) 


of the time series. 

Autocovariance and Autocorrelation Coefficients. The stationarity assumption also im¬ 
plies that the joint probability distribution p(z ti , z t f) is the same for all times t x , t 2 , which 
are a constant interval apart. In particular, it follows that the covariance between values z t 
and z t+k , separated by k intervals of time, or by lag k, must be the same for all t under 
the stationarity assumption. This covariance is called the autocovariance at lag k and is 
defined by 


Yk = C0V [2A z t+k 1 = E K z t ~ l<)( z , +k ~ P)] 


(2.1.5) 
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(a) k =1 (b) k-2 




FIGURE 2.4 Scatter diagrams at lags (a) k = 1 and (b) k = 2 for the batch data of Figure 2.1. 


Similarly, the autocorrelation at lag k is 

E[(z t - fi){z t+k - n)] 

Pk = , 

y/E[(z, - n) 2 ]E[(z t+k - W] 

= £[(z, - tk)(z, +k - 14)] 

G 2 

Z 

since, for a stationary process, the variance o 2 _ = y 0 is the same at time t + k as at time t. 
Thus, the autocorrelation at lag k , that is, the correlation between z t and z t+k , is 

Pk = ~ (2-1-6) 

ro 

which implies, in particular, that p 0 = 1. 

It also follows for a stationary process that the nature of the joint probability distribution 
p(z t , z t+k ) of values separated by k intervals of time can be inferred by plotting a scatter 
diagram using pairs of values (z r , z t+k ) of the time series, separated by a constant interval or 
lag k. For the batch data displayed in Figure 2.1, Figure 2.4(a) shows a scatter diagram for 
lag k = 1, obtained by plotting z (+1 versus z t , while Figure 2.4(b) shows a scatter diagram 
for lag k = 2, obtained by plotting z t+2 versus z t . We see that neighboring values of the 
time series are correlated. The correlation between z, and z t+l appears to be negative and 
the correlation between z, and z r+2 positive. Figure 2.4 was generated in R as follows: 

> Yield = read.table("SeriesF.txt",header=TRUE) 

> yl=Yield[2:70] 

> xl=Yield[1:69] 

> y2=Yield[3:70] 

> x2=Yield[1:68] 

> win.graph(width=5,height=2.7,pointsize=5) 

> par(mfrow=c(1,2)) % Places two graphs side-by-side 

> plot(y=yl,x=xl,ylab=expression(z[t + 1]),xlab=expression(z[t] ) , 
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main="(a): k=l",type='p') 

> abline (lsfit (xl, yl) ) 

> plot(y=y2,x=x2,ylab=expression(z[t + 2]),xlab=expression(z[t] ), 

main="(b): k=2",type='p') 

> abline (lsfit (x2,y2) ) 


2.1.3 Positive Definiteness and the Autocovariance Matrix 


The covariance matrix associated with a stationary process for observations (z l7 z 2 , ■. •, z, n ) 
made at n successive times is 



ro 

Yl 

Yl • 

- Yn-l 


r i 

Yo 

Yi • 

- Yn-l 

r„ = 

72 

Yi 

Yo • 

- Yn -3 


Yn-l 

Yn- 

2 Yn -3 " 

•• Yo 


1 

pi 

Pi ' 

" Pn-l 

Pi 

l 

Pi ■ 

" Pn-l 

Pi 

pi 

1 • 

" Pn-3 

^n— 1 

Pn-l Pn—3 ' 

•• 1 


(2.1.7) 


A covariance matrix r„ of this form, which is symmetric with constant elements on any 
diagonal, is called an autocovariance matrix, and the corresponding correlation matrix 
P„ is called an autocorrelation matrix. Now, consider any linear function of the random 
variables z t , z ( _ x ,..., z ( _ B+1 : 


L, — l l z t + l 2 z t -i + ••• + l„z t -„ + 1 (2.1.8) 

Since cov[z,-, zj] = Y\j-i\ for a stationary process, the variance of L, is 

n n 

var[LJ = 2 X l ‘ l jY\j-i\ 

>= 11=1 

which is necessarily greater than zero if the /’ s are not all zero. It follows that both 
the autocovariance matrix and the autocorrelation matrix are positive definite for any 
stationary process. Correspondingly, it is seen that both the autocovariance function { y k } 
and the autocorrelation function { p k }, viewed as functions of the lag k, are positive-definite 
functions in the sense that Yn=i E"= i h^jY\j-i\ > 0 f° r ever Y positive integer n and all 
constants l lt ..., 

Conditions Satisfied by the Autocorrelations of a Stationary Process. The positive defi¬ 
niteness of the autocorrelation matrix (2.1.7) implies that its determinant and all principal 
minors are greater than zero. In particular, for n = 2, 
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so that 


1 - p\ > 0 


and hence 


-1 <Pi < 1 


Similarly, for n = 3, we must have 


which implies that 


1 Pi 
Pi 1 


> 0 


1 p 2 
Pi 1 


> 0 


1 Pi P2 
Pi 1 Pi 
Pi Pi 1 


> 0 


-1 < Pi < 1 
-1 < Pi < 1 


-1 < 


Pi- p\ 


< 1 


and so on. Since P„ must be positive definite for all values of n, the autocorrelations of 
a stationary process must satisfy a very large number of conditions. As will be shown 
in Section 2.2.3, all of these conditions can be brought together in the definition of the 
spectrum. 


Stationarity of Linear Functions. It follows from the definition of stationarity that the 
process L t , obtained by performing the linear operation (2.1.8) on a stationary process z, 
for fixed n and fixed coefficients l\,, /„, is also stationary. The autocovariance of the 
process L t , at a general lag k > 0, is given by 

n n n n 

CO v[L t ,L,_ k \ = X Z f/0cov[Z/+i-j. z t +i-k-j\ = YjYj l i l i Y \k+j-i\ 

>=1 j= 1 i=1 j= 1 

In particular, the first difference Vz ( = z. t — z,_ l and higher differences V d z r are station¬ 
ary. This result is of particular importance to the discussion of nonstationary time series 
presented in Chapter 4. 

The result also extends to infinite linear operations or infinite linear (time-invariant) 
filters applied to a stationary process {z r }, under a condition of absolute summability. That 
is, if {z t } is a stationary process and {y t } is defined by the infinite linear (time-invariant) 
filter 


00 

y, = vq 7 -! + y / i z t-i + viZt-2 + •■■ = X ¥ > z <-i 

i=0 


(2.1.9) 
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with fixed coefficients {i// ( } such that I V, I < 00 > then {y t } is also stationary. The 
absolute summability condition, W t \ < oo, guarantees that the variables y t in (2.1.9) 
are well-defined finite random variables (with probability one) and represent the limit of 
the sequence 2/Ln Vi z t-i as n —► oo. The variance of y t in (2.1.9) (taking E[z t \ = 0 for 
convenience) is 


var[y,] = E[yf] = ^ ^ V / l V / jY\j-i\ 

1 =0 j =0 

This variance is finite since I Z“ 0 V'iV'jY\j-i\ I < Z“o Z“o K'l I VO I \Y\j-i\ I ^ 
y 0 { 2 m) l^/l} 2 < 00 • The autocovariance of y, at any lag k > 0 is then 


COY 


[*>*-*] = E E = E E V/V>X|*+^-/i 

i =0 y=o /=o ;=o 


(2.1.10) 


which converges by the dominated convergence result. 


Gaussian Processes. If the probability distribution of observations associated with any set 
of times is a multivariate normal distribution, the process is called a normal or Gaussian 
process. Since the multivariate normal distribution is fully characterized by its moments 
of first and second order, the existence of a fixed mean // and an autocovariance matrix 
T„ of the form (2.1.7) for all n would be sufficient to ensure the stationarity of a Gaussian 
process. 


Weak Stationarity. We have seen that for a process to be strictly stationary, the whole 
probability structure must depend only on time differences. A less restrictive requirement, 
called weak stationarity of order /, is that the moments up to some order / depend only 
on time differences. For example, the existence of a fixed mean /./ and an autocovariance 
matrix T n of the form (2.1.7) is sufficient to ensure stationarity up to second order. That 
is, a process {z t } is weakly stationary (of order 2), or second-order stationary, if the mean 
E[z t ] = n is a fixed constant for all t and the autocovariances cov[z f , z t+k ] = y k depend 
only on the time difference or time lag k for all t. Thus, second-order stationarity and an 
assumption of normality are sufficient to produce strict stationarity. 


White Noise Process. The most fundamental example of a stationary process is a sequence 
of independent and identically distributed random variables, denoted as a ls ..., a t ,..., 
which we also assume to have mean zero and variance erj This process is strictly stationary 
and is referred to as a white noise process. Because independence implies that the a, are 
uncorrelated, its autocovariance function is simply 


Yk = a t+k\ = 



k = 0 
k# 0 


If one concentrates only on the second-order properties, then a sequence of random vari¬ 
ables a t , which are uncorrelated, have mean zero, and common variance <j^ has the same 
autocovariance function y k as above, and is weakly (second-order) stationary. Such a pro¬ 
cess may also be referred to as a white noise process (in the weak sense), when the focus 
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is only on the second-order properties. Although the white noise process has very basic 
properties, this process plays an important role in the building of processes with much more 
interesting and more complicated properties through linear filtering operations as in (2.1.8) 
and (2.1.9). 

2.1.4 Autocovariance and Autocorrelation Functions 

It was seen in Section 2.1.2 that the autocovariance coefficient y k , at lag k, measures the 
covariance between two values z t and z t+k a distance k apart. The plot of y k versus lag k 
is called the autocovariance function {y k } of the stochastic process. Similarly, the plot of 
the autocorrelation coefficient p k as a function of the lag k is called the autocorrelation 
function {p k } of the process. Note that the autocorrelation function is dimensionless, that 
is, independent of the scale of measurement of the time series. Since y k = p k cr 2 knowledge 
of the autocorrelation function { p k } and the variance o 2 is equivalent to knowledge of the 
autocovariance function { y k }. 

The autocorrelation function, shown in Figure 2.5 as a plot of the diagonals of the 
autocorrelation matrix, reveals how the correlation between any two values of the se¬ 
ries changes as their separation changes. Since p k = p_ k , the autocorrelation function is 



FIGURE 2.5 Autocorrelation matrix and corresponding autocorrelation function of a stationary 
process. 
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FIGURE 2.6 Positive half of the autocorrelation function of Figure 2.5. 


necessarily symmetric about zero, and in practice it is only necessary to plot the positive 
half of this function. Figure 2.6 shows the positive half of the autocorrelation function 
given in Figure 2.5. When we speak of the autocorrelation function, we typically mean 
the positive half. In the past, the autocorrelation function has sometimes been called the 
correlogram. 

From what has previously been shown, a normal stationary process z t is completely 
characterized by its mean // and its autocovariance function { y k }, or equivalently by its 
mean //, variance a and autocorrelation function {p k }. 

2.1.5 Estimation of Autocovariance and Autocorrelation Functions 

Up to now, we have only considered the theoretical autocorrelation function that describes a 
stochastic process. In practice, we have a finite time series z lf z 2 , ■. ■, z N of N observations, 
from which we can only obtain estimates of the mean /./ and the autocorrelations. The mean 
p = E[z, t \ is estimated as in (2.1.3) by the sample mean z = z. t /N. It is easy to see 
that E[z\ = f t- so that z is an unbiased estimator of p. As a measure of precision of z as an 
estimator of //, we find that 


N N 


N -1 




Pk 


(=1 5=1 L k= 1 

A “large-sample” approximation for this variance is given by 


Varm= (§)( 1+2 5^ 

\ k =1 / 


in the sense that IVvar[z] —► y 0 (l + 2 Pk) as N -»■ oo, assuming that \Pk\ 

< oo. Notice that the first factor in var[z], y 0 //V, is the familiar expression for the variance 
of z obtained from independent random samples of size N, but the presence of autocorre¬ 
lation among the z t values can substantially affect the precision of z. For example, in the 
case where a stationary process has autocorrelations p k = <fi\ k K \<fr\ < 1, the large-sample 
approximation for the variance of z becomes var[z] = (Fo/^V)[(l + </>)/( 1 — </>)], and the 
second factor can obviously differ substantially from 1. 

A number of estimates of the autocorrelation function have been suggested in the 
literature, and their properties are discussed by Jenkins and Watts (1968), among others. It 
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TABLE 2.1 Estimated Autocorrelation Function of Batch Data 


k 

r k 

k 

r k 

k 

r k 

i 

-0.39 

6 

-0.05 

ii 

0.11 

2 

0.30 

7 

0.04 

12 

-0.07 

3 

-0.17 

8 

-0.04 

13 

0.15 

4 

0.07 

9 

0.00 

14 

0.04 

5 

-0.10 

10 

0.01 

15 

-0.01 


is concluded that the most satisfactory estimate of the Ath lag autocorrelation p k is 


c k 

r k = Pk = ~ 
Co 


( 2 . 1 . 11 ) 


where 


N-k 

c k =h = ^7 £ (z, - z)(z, + * - z) A' = 0,1,2, — K (2.1.12) 

t =l 

is the estimate of the autocovariance y k and z is the sample mean of the time series. The 
values r k in (2.1.11) may be called the sample autocorrelation function. To obtain a useful 
estimate of the autocorrelation function in practice, we would typically need at least 50 
observations, and the estimated autocorrelations r k would be calculated for A = 0,1,..., K, 
where K was not larger than, say, N/A. 

The estimated autocorrelation function r k of the batch data in Figure 2.1 is given 
in Table 2.1 and plotted in Figure 2.7. The autocorrelation function is characterized by 
correlations that alternate in sign and tend to damp out with increasing lag. Autocorrelation 
functions of this kind are not uncommon in production data and can arise because of 
4 ‘carryover’ ’ effects. In this particular example, a high-yielding batch tended to produce 
tarry residues, which were not entirely removed from the vessel and adversely affected the 
yield of the next batch. 

Figure 2.7 and the autocorrelations shown in Table 2.1 were generated in R as follows: 

> Yield = read.table("SeriesF.txt",header=TRUE) 

> ACF = acf (Yield,15) 

> ACF % retrieves the values shown in Table 2.1 

2.1.6 Standard Errors of Autocorrelation Estimates 

To identify a model for a time series, using methods to be described in Chapter 6, it is 
useful to have a rough check on whether p k is effectively zero beyond a certain lag. For this 
purpose, we can use the following expression for the approximate variance of the estimated 
autocorrelation coefficient of a stationary normal process given by Bartlett (1946): 

1 00 

var[r k ] ~ — ^ (p 2 v + p v+k p v _ k - Ap k p v p v ~ k + 2 p 2 v p 2 k ) 

v =—00 


(2.1.13) 
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Yield 



FIGURE 2.7 Estimated autocorrelation function of batch data. 

For example, if p k = c/)^’ (—1 < <p < 1), that is, the autocorrelation function damps out 
exponentially, (2.1.13) gives 

var['*] - 

and in particular 

vartrj] ~ ip(l - 0 2 ) 

For any process for which all the autocorrelations p v are zero for v > q, all terms except 
the first appearing in the right-hand side of (2.1.13) are zero when k > q. Thus, for the 
variance of the estimated autocorrelation r k , at lags k greater than some value q beyond 
which the theoretical autocorrelation function may be deemed to have ‘ "died out ’ ’, Bartlett’s 
approximation gives 

var[r fc ]^l^l+2gpM k>q (2.1.15) 

To use this result in practice, the estimated autocorrelations r k (k = 1,2,... ,q) are 
substituted for the theoretical autocorrelations p k , and when this is done, we refer to the 
square root of (2.1.15) as the large-lag standard error. On the assumption that the p k are 
all zero beyond some lag k = q, the large-lag standard error approximates the standard 
deviation of r k for suitably large lags (k > q). We will show in Chapter 3 that the moving 
average (MA) process in (1.2.3) has a correlation structure such that the approximation 
(2.1.15) applies to this process. 

Similar expressions for the approximate covariance between the estimated autocorrela¬ 
tions r k and r k+s at two different lags k and k + s were also given by Bartlett (1946). In 




1 — 0 ~ 


(2.1.14) 
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particular, the large-lag approximation reduces to 

I « 

cov[r k , r k+s \ ~ — Y, PvPv+s k>c l (2.1.16) 

v=-q 

This result shows that care is required in the interpretation of individual autocorrelations 
because large covariances can exist between neighboring values. This effect can sometimes 
distort the visual appearance of the sample autocorrelation function, which may fail to damp 
out according to expectation. 

A case of particular interest occurs for q = 0, that is, when the p k are taken to be zero for 
all lags (other than lag 0), and hence the series is completely random or white noise. Then, 
the standard errors from (2.1.15) for estimated autocorrelations r k take the simple form 

se[r fc ] ~ — l — k > 0 

\[n 

In addition, in this case the result in (2.1.16) indicates that estimated autocorrelations r k 
and r k+s at two different lags are not correlated, and since the r k are also known to be 
approximately normally distributed for large N , a collection of estimated autocorrelations 
for different lags will tend to be independently and normally distributed with mean 0 and 
variance 1 /N. 

Two standard error limits determined under the assumption that the series is completely 
random are included for the autocorrelation function of the batch data in Figure 2.7. Since 
N equals 70 in this case, the two standard errors limits are around ±0.24. The magnitude 
of the estimated autocorrelation coefficients are clearly inconsistent with the assumption 
that the series is white noise. 

Example. For further illustration, assume that the following estimated autocorrelations 
were obtained from a time series of length N = 200 observations, generated from a stochas¬ 
tic process for which it was known that p l = —0.4 and p k = 0 for k > 2: 


k 

r k 

k 

r k 

i 

-0.38 

6 

0.00 

2 

-0.08 

7 

0.00 

3 

0.11 

8 

0.00 

4 

-0.08 

9 

0.07 

5 

0.02 

10 

-0.08 


On the assumption that the series is completely random, that is, white noise, we have 
q = 0. Then, for all lags , (2.1.15) yields 

var[rj ~ — = — = 0.005 
k N 200 

The corresponding standard error is 0.07 = (0.005) 1 / 2 . Since the value of —0.38 for r 1 
is over five times this standard error, it can be concluded that p, is nonzero. Moreover, 
the estimated autocorrelations for lags greater than 1 are all small. Therefore, it might 
be reasonable to ask next whether the series was compatible with a hypothesis (whose 
relevance will be discussed later) whereby p l was nonzero, but p k = 0 (k > 2). Using 
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(2.1.15) with q = 1 and substituting r j for p\ , the estimated large-lag variance under this 
assumption is 

var[r J ~ — [1 + 2(—0.38) 2 ] = 0.0064 k > 1 
k 200 

yielding a standard error of 0.08. Since the estimated autocorrelations for lags greater than 
1 are small compared with this standard error, there is no reason to doubt the adequacy of 
the model /q / 0, p k = 0 (k > 2). 

Remark. The limits shown in Figure 2.7, which assume that the series is white noise, are 
generated by default in R. Alternative limits, consistent with the assumptions underlying 

(2.1.15) , can be obtained by adding the argument ci.type="ma" to the acf() command. 


2.2 SPECTRAL PROPERTIES OF STATIONARY MODELS 
2.2.1 Periodogram of a Time Series 

Another way of analyzing a time series is based on the assumption that it is made up of 
sine and cosine waves with different frequencies. A device that uses this idea, introduced 
by Schuster (1898), is th e periodogram. The periodogram was originally used to detect and 
estimate the amplitude of a sine component, of known frequency, buried in noise. We will 
use it later to provide a check on the randomness of a series (usually, a series of residuals 
after fitting a particular model), where we consider the possibility that periodic components 
of unknown frequency may remain in the series. 

To illustrate the calculation of the periodogram, suppose that the number of observations 
N = 2q + 1 is odd. We consider fitting the Fourier series model 

9 

z, = a Q + ^(a,c ir + p t s it ) + e t (2.2.1) 

i=i 

where c jt = cos(2 nfpt), s it = s,\n(2 jt fjt). and /,■ = i/N, which is the /th harmonic of the 
fundamental frequency 1 /N associated with the /th sine wave component in (2.2.1) with 
frequency f t and period 1//,- = N //. The least squares estimates of the coefficients a Q and 
(a,-, /?,) will be 


°0 = 

z 




(2.2.2) 



N 




«, = 

2 

N 

2 

t= 1 

z t c it 


(2.2.3) 




i = 1,2,... ,q 




N 




b,= 

2 

N 

2 

t= l 

Z,S it 


(2.2.4) 


since cj t = = -N/2, and all terms in (2.2.1) are mutually orthogonal over 

t = 1,... ,N. The periodogram then consists of the q = (N — l)/2 values 

/(/«) = y («■ + bf) 


i = 1,2 ,... ,q 


(2.2.5) 
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where /(/,•) is called the intensity at frequency /,-. When N is even, we set N = 2q and 
(2.2.2)-(2.2.5) apply for / = 1,2, l),but 

N 

a t = ^ T ( - iyZ > 

t= 1 

bg=0 

and 

I(f q ) = 1(0.5) = Na 2 q 

Note that the highest frequency is 0.5 cycle per time interval because the smallest period is 
two intervals. 

2.2.2 Analysis of Variance 

In an analysis of variance table associated with the fitted regression (2.2.1), when N is odd, 
we can isolate (N — l)/2 pairs of degrees of freedom, after eliminating the mean. These 
are associated with the pairs of coefficients (aj, b{), (a 2 , b 2 ),..., ( a q , b q ), and hence with 
the frequencies \ / N ,2/N,.... q/N. The periodogram /(/,) = ( N/2)(a 2 + b 2 ) is seen to 
be simply the "sum of squares” associated with the pair of coefficients (a ; , b ( ) and hence 
with the frequency /, = i/N or period p t = N/i. Thus, 

n q 

£(z f -z) 2 = j /(/,) (2.2.6) 

t= t i=i 

When N is even, there are (N — 2)/2 pairs of degrees of freedom and a further single 
degree of freedom associated with the coefficient a q . 

If the series were truly random, containing no systematic sinusoidal component, that is, 

z, = a o + e t 

with a 0 the fixed mean, and the e’s independent and normal, with mean zero and variance a 2 . 
each component /(/,■) would have expectation 2 a 2 and would be distributed 1 as <j 2 / 2 (2), 
independently of all other components. By contrast, if the series contained a systematic 
sine component having frequency /,-, amplitude A, and phase angle F, so that 

z. t = a 0 + a cos(2tt f t t) + /I sin(2^/,?) + e t 

with A sin F = a and A cos I = fi, the sum of squares /(/,) would tend to be inflated since 
its expected value would be 2a 2 + N(a 2 + p 2 )/2 = 2a 2 + NA 2 /2. 

In practice, it is unlikely that the frequency / of an unknown systematic sine component 
would exactly match any of the frequencies /, for which intensities have been calculated. 
In this case the periodogram would show an increase in the intensities in the immediate 
vicinity of /. 


'it is to be understood that / 2 (m) refers to a random variable having a chi-square distribution with m degrees of 
freedom, defined explicitly, for example, in Appendix A7.1. 
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TABLE 2.2 Mean Monthly Temperatures for Central England in 1964 


t 

z , 

c u 

t 

z , 

c u 

1 

3.4 


7 

16.1 

-0.87 

2 

4.5 


8 

15.5 

-0.50 

3 

4.3 

0.00 

9 

14.1 

0.00 

4 

8.7 

-0.50 

10 

8.9 

0.50 

5 

13.3 

-0.87 

11 

7.4 

0.87 

6 

13.8 

-1.00 

12 

3.6 

1.00 


Example. A large number of observations would generally be used in calculation of the 
periodogram. However, to illustrate the details of the calculation, we use the set of 12 
mean monthly temperatures (in degrees Celsius) for central England during 1964, given in 
Table 2.2. The table gives c it = cos(2;rf/12), which is required in the calculation of a { , 
obtained from 

«i = -k(3.4)(0.87)+- + (3.6X1.00)] 

6 

= -5.30 

The values of the a h bj, i = 1,2,... ,6, are given in Table 2.3 and yield the analysis of 
variance of Table 2.4. As would be expected, the major component of these temperature 
data has a period of 12 months, that is, a frequency of 1/12 cycle per month. 

2.2.3 Spectrum and Spectral Density Function 

For completeness, we add here a brief discussion of the spectrum and spectral density 
function. The use of these important tools is described more fully by Jenkins and Watts 
(1968), Bloomfield (2000), and Shumway and Staffer (2011, Chapter 4), among others. 
We do not apply them to the analysis of time series in this book, and this section can be 
omitted on first reading. 

Sample Spectrum. The definition of the periodogram in (2.2.5) assumes that the frequen¬ 
cies /, = i/N are harmonics of the fundamental frequency 1 /2V". By way of introduction 
to the spectrum, we relax this assumption and allow the frequency / to vary continuously 


TABLE 2.3 Amplitudes of Sines and Cosines at 
Different Harmonics for Temperature Data 


i 

a, 

b, 

i 

-5.30 

-3.82 

2 

0.05 

0.17 

3 

0.10 

0.50 

4 

0.52 

-0.52 

5 

0.09 

-0.58 

6 

-0.30 
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TABLE 2.4 Analysis of Variance Table for Temperature Data 


i 

Frequency 

/,- 

Period 

Periodogram 

1(f) 

Degrees of 
Freedom 

Mean 

Square 

i 

1/12 

12 

254.96 

2 

127.48 

2 

1/6 

6 

0.19 

2 

0.10 

3 

1/4 

4 

1.56 

2 

0.78 

4 

1/3 

3 

3.22 

2 

1.61 

5 

5/12 

12/5 

2.09 

2 

1.05 

6 

1/2 

2 

1.08 

1 

1.08 




263.10 

11 

23.92 


in the range of 0-0.5 cycle. The definition (2.2.5) of the periodogram may be modified to 
1(f) = ^-(a 2 f + ft*) 0 <f<\ (2-2.7) 


and 1(f) is then referred to as the sample spectrum (Jenkins and Watts, 1968). Like the 
periodogram, it can be used to detect and estimate the amplitude of a sinusoidal component 
of unknown frequency / buried in noise and is, indeed, a more appropriate tool for this 
purpose if it is known that the frequency / is not harmonically related to the length of the 
series. Moreover, it provides a starting point for the theory of spectral analysis, using a 
result given in Appendix A2.1. This result shows that the sample spectrum 1(f) and the 
estimate c k of the autocovariance function are linked by the important relation 


1(f) = 2 


N -1 

c 0 + 2 ^ c k cos(2/r fk) 
k=\ 


0 </< i 
2 


( 2 . 2 . 8 ) 


That is, the sample spectrum is the Fourier cosine transform of the estimate of the autoco¬ 
variance function. 


Spectrum. The periodogram and sample spectrum are appropriate tools for analyzing time 
series made up of mixtures of sine and cosine waves, at fixed frequencies buried in noise. 
However, stationary time series of the kind described in Section 2.1 are characterized by 
random changes of frequency, amplitude, and phase. For this type of series, the sample 
spectrum 1(f) fluctuates wildly and is not capable of any meaningful interpretation. 

However, suppose that the sample spectrum was calculated for a time series of N 
observations, which is a realization of a stationary normal process. As already mentioned, 
such a process would not have any cosine or sine deterministic components, but we could 
formally carry through the Fourier analysis and obtain values of (a j, hj) for any given 
frequency /. If repeated realizations of N observations were taken from the stochastic 
process, we could build up a population of values for a bf, and 1(f). Thus, we could 
calculate the mean value of 1(f) in repeated realizations of size N, namely, 

IV-1 

E[c 0 \ + 2 ^ E[c k ] cos(2 jrfk) 
k= 1 


Emm = 2 


(2.2.9) 
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For large N, it may be shown (e.g., Jenkins and Watts, 1968) that the mean value of the 
estimate c k of the autocovariance coefficient in repeated realizations tends to the theoretical 
autocovariance y k , that is, 


Jim E[c k ] = y k 

N —> oo 


On taking the limit of (2.2.9) as N tends to infinity, the power spectrum p(f) is defined 


by 


P(f) = lim £[/(/)] = 2 

N—>oo 


r 0 + 2 X cos ( 2 n f k ) 

k= 1 


o </<^ (2.2.10) 


We note that since 


IK/) I < 2 


17o I + 2 ^ Irjfcl I cos(2^r/fc)| 


k= 1 


<2(|y 0 |+2^| ri | 


( 2 . 2 . 11 ) 


k= 1 


a sufficient condition for the spectrum to converge is that y k damps out rapidly enough for 
the series (2.2.11) to converge. Since the power spectrum is the Fourier cosine transform of 
the autocovariance function, knowledge of the autocovariance function is mathematically 
equivalent to knowledge of the spectrum, and vice versa. From now on, we refer to the 
power spectrum as simply the spectrum. 

On integrating (2.2.10) between the limits 0 and i, the variance of the process z t is 


r 1/2 

Yo = ° 2 z = P(f)df (2-2.12) 

Jo 

Hence, in the same way that the periodogram 1(f) shows how the variance (2.2.6) of 
a series, consisting of mixtures of sines and cosines, is distributed between the various 
distinct harmonic frequencies, the spectrum p(f) shows how the variance of a stochastic 
process is distributed between a continuous range of frequencies. One can interpret p(f) df 
as measuring approximately the variance of the process in the frequency range of / to 
f + df In addition, from the definition in (2.2.10), the spectral representation for the 
autocovariance function { y k } can be obtained as 


r 1/2 

Yk= cos(2t zfk)p(f) df 

Jo 


which together with (2.2.10) directly exhibits the one-to-one correspondence between the 
power spectrum and the autocovariance function of a process. Conversely, since the y k 
form a positive-definite sequence, provided the series (2.2.11) converges, it follows from 
Herglotz’s theorem (see, e.g., Loeve, 1977) that a unique function pif ) exists such that 
y k have the spectral representation y k = \ fl(p e ' 2n ^ k p(f) df. Consequently, the power 
spectrum p(f) of a stationary process, for which (2.2.11) converges, can be defined as this 
unique function, which is guaranteed to exist and must have the form of the right-hand side 
of (2.2.10) by the spectral representation. 
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The fundamental property of the spectrum that p(f ) > 0 for all 0 < / < ^ follows from 
1(f) > 0 and the definition in (2.2.10). In fact, a function p(f) defined oil 0 < / < j can 
be the spectrum of a stationary process if and only if it satisfies p(f ) > 0 for 0 < / < i 

and y 0 p(f)df < oo. Conversely, a sequence {y k }“_ 0 can betheautocovariance function 
of a stationary process if and only if { y k } is a nonnegative-definite sequence, and this is 
equivalent to the condition that p(f) > 0, 0 < / < |, with p(f) defined by (2.2.10). 

Spectral Density Function. It is sometimes more convenient to base the definition (2.2.10) 
of the spectrum on the autocorrelations p k rather than on the autocovariances y k . The 
resulting function 


g(/) = 


P(f) 


= 2 


1 + 2 ^ p k cos(2n fk) 


k =l 


0 < / < - 
2 


(2.2.13) 


is called the spectral density function. Using (2.2.12), it is seen that the spectral density 
function has the property 


1/2 

S(f)df= 1 

Since g(f) is also positive, it has the same properties as an ordinary probability density 
function. This analogy extends to the estimation properties of these two functions, as we 
discuss next. 

Estimation of the Spectrum. One would expect that an estimate of the spectrum could be 
obtained from (2.2.10), by replacing the theoretical autocovariances y k with their estimates 
c k . Because of (2.2.8), this corresponds to taking the sample spectrum as an estimate of 
p(f). However, it can be shown (e.g., Jenkins and Watts, 1968) that the sample spectrum 
of a stationary time series fluctuates violently about the theoretical spectrum. An intuitive 
explanation of this fact is that the sample spectrum corresponds to using an interval, in the 
frequency domain, whose width is too small. This is analogous to using too small a group 
interval for the histogram when estimating an ordinary probability distribution. By using a 
modified or smoothed estimate 


p(f) = 2 


jV-1 

c 0 + 2 ^ X k c k cos(2;r fk) 
k= l 


(2.2.14) 


where the A k are suitably chosen weights called a lag window, it is possible to increase 
the bandwidth of the estimate and to obtain a smoother estimate of the spectrum. The 
weights A k in (2.2.14) are typically chosen so that they die out to zero for lags k > M, 
where M is known as the truncation point and M < N is moderately small in relation to 
series length N. As an alternative computational form, one can also obtain an estimate of 
the spectrum smoother than the sample spectrum 1(f) by forming a weighted average of 
a number of periodogram values I(f j+ j) in a small neighborhood of frequencies around a 
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FIGURE 2.8 Estimated power spectrum of batch data. 


given frequency /,. Specifically, a smoothed periodogram estimator of /;(/, ) takes the form 


Z w VjV(f' + lf 

j=-m \ 


where W(fj) = 1> the symmetric weighting function W(fj) is referred to as the 

spectral window, and m is chosen to be much smaller than N /2. 

Figure 2.8 shows an estimate of the spectrum of the batch data. It is seen that most 
of the variance of the series is concentrated at high frequencies. This is due to the rapid 
oscillations in the original series, shown in Figure 2.1. 


Remark. The command spectrum() can be used to estimate the power spectrum in R. 
To use this command, a smoothing window must be specified; see help(spectrum) and 
the references therein for details. The following command will generate a graph roughly 
similar to Figure 2.8: 


spectrum(Yield,spans=c(7,7),taper=0) 


As an alternative, the R program spec.ar() fits an autoregressive model of order p to the 
series and computes the spectral density of the fitted model. The lag order p is selected 
using a model selection criterion such as the AIC to be discussed in Chapter 6. 


2.2.4 Simple Examples of Autocorrelation and Spectral Density Functions 

For illustration, we now show equivalent representations of two simple stationary stochastic 
processes based on: 

1. Their theoretical models 

2. Their theoretical autocorrelation functions 

3. Their theoretical spectra 

Consider the two processes 

Z, = 10 + Qj + _ j Z t = 10 + Qj — dj_ i 

where a t ,a t _ x ,... are a sequence of uncorrelated normal random variables with mean 
zero and variance a 1 2 3 , that is, Gaussian white noise. From the result in Section 2.1.3 on 
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stationarity of linear functions, it is clear that the two processes above are stationary. Using 
the definition (2.1.5), 


Yk = cov[z,, z t+k ] = E[(z, - n){z t+k - n)] 

where E[z t ] = E[z t+k ] = n = 10, and the autocovariances of these two stochastic pro¬ 
cesses are obtained from 


Yk = C0V K + a t-i’ a t+k + a t+k- ll 

= cov[o r , a t+k \ + cov[a„ a t+k _ J + co v[a t _ x ,a t+k \ + cov[a r _!, a t+k _ x ] 
and y k = cov[a r — a t _\,a t+k — a t+k _{\, respectively. Hence, the autocovariances are 


Yk 


2 o- 2 k = 0 

a 

a 2 k= 1 

a 

0 k > 2 


Yk 


Thus, the theoretical autocorrelation functions are 

f 0.5 k = 1 


Pk 


Pk 


la 2 k = 0 

a 

-a 2 k = 1 

a 

0 k> 2 

-0.5 k = 1 
0.0 k >2 


0.0 k > 2 

v v 

and using (2.2.13), the theoretical spectral density functions are 

gif) = 2[1 + cos(2ff/)] g(f) = 2[1 - cos(2^/)] 


The autocorrelation functions and spectral density functions are plotted in Figure 2.9 
together with a sample time series from each process. 


1. It should be noted that for these two stationary processes, knowledge of either the 
autocorrelation function or the spectral density function, with the mean and vari¬ 
ance of the process, is equivalent to knowledge of the model (given the normality 
assumption). 

2. It will be seen that the autocorrelation function reflects one aspect of the behavior 
of the series. The comparatively smooth nature of the first series is accounted for by 
the positive association between successive values. The alternating tendency of the 
second series, in which positive deviations usually follow negative ones, is accounted 
for by the negative association between successive values. 

3. The spectral density throws light on a different but equivalent aspect. The predom¬ 
inance of low frequencies in the first series and high frequencies in the second is 
shown by the spectra. 


Remark. The two models considered in Figure 2.9 are special cases of the moving average 
model defined in (1.2.3). Specifically, the models are first-order moving average, or MA( 1), 
models with parameters 0 = — 1 and 6 = +1, respectively. As such, they are also special 
cases of the more general autoregressive integrated moving average (ARIMA) model 
defined in (1.2.7), where the order now is (0, 0,1). Figure 2.9 was generated in R by taking 
advantage of special functions for simulating ARIMA processes and for computing the 



42 AUTOCORRELATION FUNCTION AND SPECTRUM OF STATIONARY PROCESSES 


Model (1): z f =10 + a f + a f _i 




Lag 



Model (2): z ( =10 + a t -a f _.| 



0 2 4 6 8 10 

Lag 



FIGURE 2.9 Two simple stochastic models with their corresponding theoretical autocorrelation 
functions and spectral density functions. 


theoretical autocorrelation function and power spectrum for these processes. The function 
arima.sim() simulates a time series from a specified model, while ARMAacf() computes its 
theoretical autocorrelation. Both functions are available in the stats library of R. The TSA 
library includes a function ARMAspec() that computes and plots the theoretical spectrum 
of an autoregressive-moving average (ARMA) process. The commands used to generate 
Figure 2.9 are given below. Note, however, that the MA parameters are entered as +1 
and —1, since R uses a definition that has positive signs of the MA parameters in (1.2.3). 

> library(TSA) 

> set.seed(12345) 

> par(mfrow=c(3,2)) % Specifies panels in three rows and two columns 

> plot(10+arima.sim(list(order=c(0,0,1), ma = +1.0), n=100),ylab = 
expression(z[t]),main=(expression(Model'(1):z[t] == 10+a[t]+a[t-l]))) 

> plot(10+arima.sim(list(order=c(0,0,1), ma = -1.0), n=100),ylab = 
expression(z[t]),main=(expression(Model'(2) :z[t] == 10+a[t] —a[t — 1] ) ) ) 

> plot(ARMAacf(ar=0,ma=l.0,10),type="h",x=(0:10),xlab="lag",ylab="ACF") 

> abline(h=0) 

> plot(ARMAacf(ar=0,ma=-l.0,10),type="h",x=(0:10),xlab="lag",ylab="ACF") 

> abline(h=0) 

> ARMAspec(model=list(ma=l.0),freq=seq(0,0.5,0.001),plot=TRUE) 

> ARMAspec(model=list(ma=-l.0),freq=seq(0,0.5,0.001),plot=TRUE) 
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2.2.5 Advantages and Disadvantages of the Autocorrelation and Spectral Density 
Functions 

Because the autocorrelation function and the spectrum are transforms of each other, 
they are mathematically equivalent, and therefore any discussion of their advantages and 
disadvantages turns not on mathematical questions but on the representational value. Be¬ 
cause, as we have seen, each sheds light on a different aspect of the data, they should be 
regarded not as rivals but as allies. Each contributes something to an understanding of the 
stochastic process in question. 

The obtaining of sample estimates of the autocorrelation function and of the spectrum 
are nonstructural approaches, analogous to the representation of an empirical distribution 
function by a histogram. They are both ways of letting data from stationary series “speak 
for themselves” and provide a first step in the analysis of time series, just as a histogram 
can provide a first step in the distributional analysis of data, pointing the way to some 
parametric model on which subsequent analysis will be based. 

Parametric time series models such as those of Section 2.2.4, are not necessarily asso¬ 
ciated with a simple autocorrelation function or a simple spectrum. Working with either 
of these nonstructural methods, we may be involved in the estimation of many lag correla¬ 
tions and many spectral ordinates, even when a parametric model containing only one or 
two parameters could represent the data. Each correlation and each spectral ordinate is a 
parameter to be estimated, so that these nonstructural approaches might be very prodigal 
with parameters, when the approach via the model could be parsimonious. On the other 
hand, initially, we probably do not know what type of model may be appropriate, and initial 
use of one or the other of these nonstructural approaches is necessary to identify the type 
of model that is needed fin the same way that plotting a histogram helps to indicate which 
family of distributions may be appropriate). The choice between the spectrum and the 
autocorrelation function as a tool in model building depends upon the nature of the models 
that turn out to be practically useful. The models that we have found useful, which we 
consider in later chapters of this book, are simply described in terms of the autocorrelation 
function, and it is this tool that we will employ for model specification. 


APPENDIX A2.1 LINK BETWEEN THE SAMPLE SPECTRUM AND 
AUTOCOYARIANCE FUNCTION ESTIMATE 

Here, we derive the result (2.2.8): 


/(/) = 2 


jV-1 

c o + 2 ^ C k cos(2 Jtfk) 
k= 1 


0 </< i 
2 


which links the sample spectrum 1(f) and the estimate c k of the autocovariance function. 
Suppose that the least square estimates aj- and bf of the cosine and sine components, at 

frequency /, in a series are combined according to dj- = aj- — ibj-, where i = —\f— T; then 


1(f) = —(a f - ib f )(a f + ib f ) 


N_ 

2 


d f d* f 


(A2.1.1) 
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where d* is the complex conjugate of dj-. Then, using (2.2.3) and (2.2.4), we obtain 


d(f) = z,[cos(2;r/f) - i sin(2^/t)] 


t=i 

N 


—ilnft 


= -Y J z 1 e 

N ii 


t=l 


Substituting (A2.1.2) in (A2.1.1) yields 

N N 


n /) = jr Z ^ - ^~ l2xf(t ~ ,,) 

v t= 1 t ’=1 


Since 


N-k 

c ^ = nH ( z ‘~ z)(z >+^ ~ f) 

(=1 


(A2.1.2) 


(A2.1.3) 


the transformation k = r — t r transforms (A2.1.3) into the following required result: 

N -1 

1(f) = 2 J c k e~ i2n f k 


k=-N +1 

IV-1 

c 0 + 2 2 c * cos ( 2jr f k ) 

k =1 


0 </<^ 
2 


EXERCISES 

2.1. The following are temperature measurements z t made every minute on a chemical 
reactor: 


200,202,208,204,204,207,207,204,202,199,201,198,200, 

202,203,205,207,211,204,206,203,203,201,198,200,206, 

207,206,200,203,203,200,200,195,202,204.207,206,200 

(a) Plot the time series. 

(b) Plot z r+1 versus z t . 

(c) Plot z t+2 versus z t . 

After inspecting the graphs, do you think that the series is autocorrelated? 

2.2. State whether or not a stationary stochastic process can have the following values of 
autocorrelations: 
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(a) p x = 0.80, p 1 = 0.55, p k = 0, for k > 2 

(b) p\ = 0.80, P 2 = 0.28, p k =0, for k > 2 

2.3. Two stationary stochastic processes z lr and z 2t have the following autocovariance 
functions: 

z h : y 0 = 0.5, y x = 0.2, = 0 (j > 2) 

z 2 r : Yo = 2.30, jq = -1.43, y 2 = 0.30, /j =0 (j > 3) 

Calculate the autocovariance function of the process z 3/ = z lf + 2z 2t and verify that 
it is a valid stationary process. 

2.4. Calculate c 0 , c 3 , c 2 , c 3 , r 3 , r 2 , r 3 for the series given in Exercise 2.1. Make a graph of 
r k ,k = 0,1,2,3. 

2.5. On the assumption that pj = 0 for j > 2, obtain the following: 

(a) Approximate standard errors for , r 2 , and r J , y > 2. 

(b) The approximate correlation between r 4 and r 5 . 

2.6. The annual sales of mink furs by a North American company during 1911-1950 
are included as Series N in Part Five of this book. The series is also available at 
http://pages.stat.wisc.edu/ reinsel/bjr-data/. 

(a) Plot the time series using R. Calculate and plot the sample autocorrelation func¬ 
tion of the series. 

(b) Repeat the analysis in part (a) for the logarithm of the series. Do you see an 
advantage in using the log transformation in this case? 

2.7. Repeat the calculations in Exercise 2.6 for the annual sunspot series given as Series 
E in Part Five of this book. Use a square root transformation of the data in part (b) in 
Exercise 2.6. {Note: This series is also available for a slightly longer time period as 
series sunspot.year in the datasets package of R). 

2.8. Calculate and plot the theoretical autocorrelation function and the spectral density 
function for the AR(1) process z, = 0.95z r _j + a t . {Hint: See the R code provided 
for Figure 2.9). Based on the results, how would you expect a time series generated 
from this model to fluctuate relative to its mean? 

2.9. Calculate and plot the theoretical autocorrelation function and the spectral density 
function for the AR(2) process z, + 0.35z r _| — 0.20z r _ 2 = a t . 

.10. Simulate a time series of length N = 300 from the AR(2) model specified in Exercise 
2.9 and plot the resulting series. 

(a) Estimate and plot the autocorrelation function for the simulated series. Compare 
the results with the theoretical autocorrelation function derived in Exercise 2.9. 

(b) Repeat the calculations performed above for a series of length N = 70 generated 
from the same process and compare the results with those for N = 200. 

(c) Do the estimated autocorrelation functions derived above show any similarity to 
autocorrelation function of the chemical yield series shown in Figure 2.7. If so, 
what would you conclude? 
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2.11. Using the data of Exercise 2.1, calculate the periodogram for periods 36, 18, 12, 9, 
36/5, and 6 and construct an analysis of variance table showing the mean squares 
associated with these periods and the residual mean square. 

2.12. A circular stationary stochastic process with period N is defined by z, = z t+N . 

(a) Show that (see, e.g., Brockwell and Davis, 1991;Fuller, 1996; Jenkins and Watts, 
1968) when N = 2 n, the latent roots of the N X N autocorrelation matrix of z t 
are 

n— 1 

A k = 1+2 ^ P, cos ^ ^ + p n cos (ilk) 

i= 1 

k = 1,2,... ,N and the latent vectors corresponding to A k ,A N _ k (with A k = 
N-k > are 

€' k = ^cos > cos ,... ,cos(27rfc)^ 

*N-k = ( sin ( V ) ’ sin ' ’ sin ( 2 ^)) 

(b) Verify that as N tends to infinity, with k/N fixed, A k tends to g(k/N)l2, where 
g(f) is the spectral density function, showing that in the limit the latent roots of 
the autocorrelation matrix trace out the spectral curve. 
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LINEAR STATIONARY MODELS 


In this chapter, we describe a general linear stochastic model that assumes that the time 
series is generated by a linear aggregation of random shocks. For practical representation, 
it is desirable to employ models that use parameters parsimoniously. Parsimony may 
often be achieved by representation of the linear process in terms of a small number of 
autoregressive-moving average (ARMA) terms. The properties of the resulting ARMA 
models are discussed in preparation for their use in model building in subsequent chapters. 


3.1 GENERAL LINEAR PROCESS 

3.1.1 Two Equivalent Forms for the Linear Process 

In Section 1.2.1, we discussed the representation of a stochastic process as the output from 
a linear filter, whose input is white noise a t , that is, 

z~ = a, + W\a,- X + W 2 a t _ 2 + - 
00 

= a t + ^j¥j a t-j (3.1.1) 

1=1 

where z t = z t — p is the deviation of the process from some origin, or from its mean, if 
the process is stationary. The general linear process (3.1.1) allows us to represent z t as a 
weighted sum of present and past values of the “white noise” process a t . Important early 
references on the development of linear stochastic models include Yule (1927), Walker 
(1931), Slutsky (1937), Wold (1938), Kendall (1945), Bartlett (1946), Quenouille (1952, 
1957), Doob (1953), Grenander and Rosenblatt (1957), Hannan (1960), Robinson (1967), 
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among others. The usefulness of these models is well-documented in subsequent literature. 
The white noise process a, may be regarded as a series of shocks that drive the system. 
It consists of a sequence of uncorrelated random variables with mean zero and constant 
variance, that is, 


E[a t ] = 0 var[a,] = o 2 a 

Since the random variables a t are assumed uncorrelated, it follows that their autocovariance 
function is 


Yk = a t+k\ = 



k = 0 
k f 0 


(3.1.2) 


Thus, the autocorrelation function of white noise has a particularly simple form 


Ci k = o 
\o fc# 0 


(3.1.3) 


A fundamental result in the development of stationary processes is that of Wold (1938), who 
established that any zero-mean purely nondeterministic stationary process z t possesses a 
linear representation as in (3.1.1) with £°1 0 1 //r < oo. The a t are uncorrelated with common 
variance a 2 but need not be independent. We will reserve the term linear processes for 
processes z, of the form of (3.1.1) in which the a t are independent random variables. 

For z t defined by (3.1.1) to represent a valid stationary process, it is necessary for 
the coefficients y/j to be absolutely summable, that is, for \y/j\ < oo. Under suitable 
conditions (see Koopmans, 1974, p. 254), z t is also a weighted sum of past zfs and an 
added shock a t , that is, 


z t — z t _i + + a t 

OO 

= Yj 7r j 2 t-j + a t (3.1.4) 

j= i 

In this alternative form, the current deviation z t from the level p may be thought of as being 
“regressed” on past deviations z t _ ] , z t _ 2 ,... of the process. 

Relationships between the t// Weights and the it Weights. The relationships between the 
ifr weights and the n weights may be obtained by using the previously defined backward 
shift operator B, such that 

Bz, = z r _! and hence B J z t = z t _j 

Later, we will also need to use the forward shift operator F = B~ l , such that 

Fz, = z t+l and F J z t = z t+j 

As an example of the use of the operator B, consider the following model 


z t = a t — 6a t _ j = (1 — 9B)a t 
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in which i//| = —9, i// ; = 0 for j > 1. Expressing a, in terms of the zfs, we obtain 

(1 - 9B)~ l z t = a, 


Hence, for |0| < 1, 

(1 +0B + 0 2 B 2 + 0 3 B 3 + •••) z, = a, 

and the deviation z t expressed in terms of previous deviations, as in (3.1.4), is 
z t = —9z t _i - 0 2 z t _ 2 - 0 3 z t _ 3 - ••• + a t 
so that for this model, 7ij = —OK 

Using the backshift operator B, the model (3.1.1) can be written as 



or 


z, = y/(B)a t 


(3.1.5) 


where 


V(B) = 1 + 'YjWjB j = ^ Vj BJ 

j =i y=0 

with y/ 0 = 1. As mentioned in Section 1.2.1, i//t B ) is called the transfer function of the 
linear filter relating z t to a t . It can be regarded as the generating function of the yr weights, 
with B now treated simply as a variable whose j th power is the coefficient of y/j. 
Similarly, (3.1.4) may be written as 



= a t 


or 


Thus, 


x(B)z t = a t 


(3.1.6) 


k{B) = 1 - Yj *jB j 

J =1 

is the generating function of the n weights. After operating on both sides of this expression 
by yr(B), we obtain 


yr(B)x(B)z t = yr(B)a t = z t 


Hence, y/{B)n(B) = 1, so that 


n{B) = yz \B) 


(3.1.7) 
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This relationship may be used to derive the n weights, knowing the i// weights, and vice 
versa. 

3.1.2 Autocovariance Generating Function of a Linear Process 

A basic data analysis tool for identifying models in Chapter 6 will be the autocorrelation 
function. Therefore, it is important to know the autocorrelation function of a linear process. 
It is shown in Appendix A3.1 that the autocovariance function of the linear process (3.1.1) 
is given by 


00 

Yk = (T l'Z l VjVj+k (3-1-8) 

j =o 

In particular, by setting k = 0, we find that its variance is 

00 

r« = °l = ° 2 alv] ( 3 - 1 - 9 ) 

j=0 

It follows that the stationarity condition of absolute summability of the coefficients y/j, 
S“ 0 | ¥j | <oo, implies that the series on the right of this equation converges, and hence 
guarantees that the process will have a finite variance. 

Another way of obtaining the autocovariances of a linear process is via the autocovari¬ 
ance generating function 


7(B) = X YkB k (3-1-10) 

k=—oo 

where y 0 , the variance of the process, is the coefficient of B° = 1, while y k , the autocovari¬ 
ance of lag k, is the coefficient of both B 1 and B~ J = Ff It is shown in Appendix A3.1 
that 


y(B) = o 2 a¥ {B) ¥ {B~ l ) = cr 2 a¥ (B) ¥ (F) (3.1.11) 

For example, suppose that z t = a t — Qa t _ x = (1 — 6B)a t so that ¥ (B) = (1 — 9B). Then, 

y(B) = <7^(1 - 0B)( 1 - 6B~ l ) 

= o 2 a [-eB~ x + (i + e 2 ) - 6B] 

Comparing with (3.1.10), the autocovariances are 

r 0 = (i + e 2 )a 2 a 

y\ = ~ e ° 2 a 
y k = 0 k> 2 

In the development that follows, when treated as a variable in a generating function, B will 
be able to take on complex values. In particular, it will often be necessary to consider the 
different cases when | B \ < 1,|5| = 1, or | B | > 1, that is, when the complex number .B lies 
inside, on, or outside the unit circle. 
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3.1.3 Stationarity and Invertibility Conditions for a Linear Process 

Stationarity. The convergence of the series (3.1.9) ensures that the process has a finite 
variance. Also, we have seen in Section 2.1.3 that the autocovariances and autocorrelations 
must satisfy a set of conditions to ensure stationarity. For a linear process (3.1.1), these 
conditions are guaranteed by the single condition that o I Wj\ < 00 • This condition can 
also be embodied in the condition that the series iy(B), which is the generating function of 
the i// weights, must converge for | B\ < 1, that is, on or within the unit circle. This result 
is discussed in Appendix A3.1. 

Spectrum of a Linear Stationary Process. It is shown in Appendix A3.1 that if we substitute 
B = , where / = \/—T, in the autocovariance generating function (3.1.11), we obtain 

one half of the power spectrum. Thus, the spectrum of a linear process is 

p(f) = 2cT 2 aW (e- i2 *f) W (e i2 *') 

= 2cr 2 a \ W (e- i2 * f )\ 2 0 <f<\ (3-1.12) 

In fact, this is the well-known expression (e.g., Jenkins and Watts, 1968) that relates the 
spectrum /;(/) of the output from a linear system to the uniform spectrum 2o 2 of a white 
noise input by multiplying it with the squared gain G 2 (f) = \\j/{e~ ,2,c ^)\ 2 of the system. 

Invertibility. We have seen that the i// weights of a linear process must satisfy the condition 
that y/(B) converges on or within the unit circle if the process is to be stationary. We now 
consider a similar restriction applied to the it weights to ensure what is called invertibility. 
This invertibility condition is independent of the stationarity condition and is also applicable 
to the nonstationary linear models, which we introduce in Chapter 4. 

To illustrate the basic idea of invertibility, consider again the special case 

z t = (l-0B)a t (3.1.13) 

Expressing the a ,’s in terms of the present and past £ f ’s, this model becomes 

a, = (1 - 0B)~ l z, = (1 + 0B + 0 2 B 2 + - + 0 k B k ){ 1 - 0 k+1 B k+l )~ l z, 


that is. 


z, = -0z,_ x - 0 2 z,_2 - ■■■ - 0 k z t _ k + a t - 0 k+l a t _ k _ l (3.1.14) 

If |0| < 1, on letting k tend to infinity, we obtain the infinite series 

z t = -0z t _i - 0 2 z t _ 2 — ■■■ + a t (3.1.15) 

and the k weights of the model in the form of (3.1.4) are jij = —0f Whatever the value of 
0, z t = (1 — 0B)a t defines a perfectly proper stationary process. However, if |0| > 1, the 
current deviation z t in (3.1.14) depends on z. t _ l , z t _ 2 ...., z t _ k , with weights that increase 
as k increases. We avoid this situation by requiring that \ 0\ < 1. We then say that the series 
is invertible. We see that this condition is equivalent to l#K = X/Lq l^/l < °°> so 
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that the series 


n{B) = (1 - OB)- 1 = Yj ei B j 

j =o 

converges for all | B \ < 1, that is, on or within the unit circle. The invertibility requirement 
is needed to associate present events with past values in a sensible manner. 

The general linear process (3.1.1) is invertible and can be written in the form 

n(B)z t = a, 

if the weights nj are absolutely summable, that is, if Yi'jLo \ K j\ < 00 > which implies that 
the series n( B ) converges on or within the unit circle. 

Thus, to summarize, a linear process (3.1.1) is stationary if Y.'JLo \ Wj I < 00 and is 
invertible if \ jij \ < oo, where n(B) = x //~ 1 (6) = 1 — Syli ttjB->. 

3.1.4 Autoregressive and Moving Average Processes 

The representations (3.1.1) and (3.1.4) of the general linear process would not be very 
useful in practice if they contained an infinite number of parameters i// ; and Kj. We now 
describe a way to introduce parsimony and arrive at models that are representationally 
useful for practical applications. 

Autoregressive Processes. Consider first the special case of (3.1.4) in which only the first 
p of the weights are nonzero. The model may be written as 

z t = + •" + 4> p z t-p + a t (3.1.16) 

where we now use the symbols t/q, <p 2 . ■■■ - 4> p for the finite set of weight parameters. The 
resulting process is called an autoregressive process of order p, or more succinctly, an 
AR(/;) process. In particular, the AR(1) and AR(2) models 

z t = + a t 

= 4>l z t-l + + a t 


are of considerable practical importance. 

The AR(p) model can be written in the equivalent form 

(1 - 0,5 - 0 2 £ 2 - <p p B p )z t = a, 


or 


= a, 


(3.1.17) 


This implies that 


4*B) 


0 \B)a t = y/(B)a t 


Hence, the autoregressive process can be thought of as the output z t from a linear filter 
with transfer function 4>~ l (B) = i//( B) when the input is white noise a t . 
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Moving Average Processes. Next consider the special case of (3.1.1), when only the first 
q of the i g weights are nonzero. The process may be written as 


‘-‘t Of Q\Of_\ "* 0 q®t—q (3.1.18) 

where we now use the symbols — 0±, —0 2 ,..., —0 q for the finite set of weight parameters. 
This process is called a moving average process 1 of order q, which we often abbreviate as 
MA(^). The special cases of MA(1) and MA(2) models 

z t = a, — 0\a t _i 

= a t -6 l a t _i -0 2 a,_ 2 

are again particularly important in practice. 

Using the backshift operator Ba t = a t _ ] , the MA(g) model can be written in the equiv¬ 
alent form as 

z, = ( 1 - 0 X B - d 2 B 2 - O q B q )a, 

or more succinctly as 


z, = 0(B)a t (3.1.19) 

Hence, the moving average process can be thought of as the output z t from a linear filter 
with transfer function 0(B) when the input is white noise a t . 

Mixed Autoregressive-Moving Average Processes. As discussed in Section 3.1.1, the 
finite moving average process 

z, = a, — 0\a t _\ = (1 - 0\B)a t |0, | < 1 

can also be written as an infinite autoregressive process 

h = ~ e i z t -1 - 0 \ z t-2 ~ - + a t 

However, if the process really was MA(1), we would not obtain a parsimonious rep¬ 
resentation using an autoregressive model. Conversely, an AR(1) process could not be 
parsimoniously represented using a moving average model. In practice, to obtain parsimo¬ 
nious parameterization, it is often useful to include both autoregressive and moving average 
terms in the model. The resulting model 

Z \ = 4>\ z t-\ + "• + (Pp z t-p + a t ~ 0 \ a t-\ ~ •" - 0 q a t-q 


or 


4>(B)z t = 0(B)a t (3.1.20) 

is called the mixed autoregressive-moving average process of order ( p , q). which we 
abbreviate as ARMA(p, q). For example, the ARMA(1,1) process is 

z, = 4>ih-i +a t - 0 x a t _ x 


1 As we remarked in Chapter 1, the term “moving average” is somewhat misleading since the weights do not sum 
to unity. However, this nomenclature is now well established and we will use it here. 
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Now writing 

z, = <p~ l (B)6(B)a I 

6(B) l-»i B - d q B“ 

~ 4>(B) a ' — 1-frB - 

we see that the mixed ARMA process can be thought of as the output z, from a linear filter, 
whose transfer function is the ratio of two polynomial operators 9(B) and </>( B), when the 
input is white noise a t . Furthermore, since z t = z t — pi, where pi = E[z,,] is the mean of the 
process in the stationary case, the general ARMA(p, q) process can also be written in terms 
of the original process z t as 


(p(B)z, = 0 Q + 9(B)a t (3.1.21) 

where the constant term 0 (j is 

9 0 = (1 -</>,- 0 2 - cP p )pi (3.1.22) 

In the next sections, we discuss some important characteristics of autoregressive, mov¬ 
ing average, and mixed models. In particular, we study their variances, autocorrelation 
functions, spectra, and the stationarity and invertibility conditions that must be imposed on 
their parameters. 


3.2 AUTOREGRESSIVE PROCESSES 

3.2.1 Stationarity Conditions for Autoregressive Processes 

The parameters 4> l , <fi 2 , ■ ■ ■, <P P of an AR(p) process 

Z t = <Mf-t +-1" fipZt-p + a t 

or 

(1 - </.>! B - <p p B p )z t = 4>(B)z t = a, 

must satisfy certain conditions for the process to be stationary. For illustration, the AR(1) 
process 

(1 - r/>] B)z, = a, 

may be written as 

00 

f r = (1 - 4>\B)~ l a t = Yj ( K a t-j 
i =o 

provided that the infinite series on the right converges in an appropriate sense. Hence, 

00 

V(B) = (1 - ^BV 1 = Y 

7=0 


(3.2.1) 



AUTOREGRESSIVE PROCESSES 


55 


We have seen in Section 3.1.3 that for stationarity, y/(B) must converge for | /i| < 1, or 
equivalently that Y'JLo \4>\V < This implies that the parameter 4>\ of an AR(1) process 
must satisfy the condition \4>\\ < 1 to ensure stationarity. Since the root of 1 — <fi\B = 0 
is B = <p~ 1 , this condition is equivalent to saying that the root of 1 — 4>\B = 0 must lie 
outside the unit circle. 

The general AR (p) process </>( B)z, = a, can be written as 

00 

z, = cp~\B)a t = y/(B)a, = ^ ¥ja t -j 

l=o 

provided that the right-side expression is convergent. Using the factorization 


4KB) = (1 - G X B)( 1 - G 2 B) - (1 - G p B) 

where G” 1 , ..., G~ l are the roots of </>(£) = 0, and expanding 0~ l (B) in partial fractions 
yields 

P 

z, = 4>~ l (B)a. = --- a, 

' v ' K-i i _ g b ' 

l—l 1 

Hence, if i //(B) = (p~ l (B) is to be a convergent series for j B\ < 1, that is, if the weights 
i j/j = Y, 1 ’-! K,G J i are t° be absolutely summable so that the AR(p) process is stationary, 
we must have |G,| < 1, for / = 1,... ,p. Equivalently, the roots of the c/>( B ) = 0 must lie 
outside the unit circle. The roots of the equation 4>(B) = 0 may be referred to as the zeros 
of the polynomial 4>(B). Thus, for stationarity, the zeros of </;(B ) must lie outside the unit 
circle. A similar argument may be applied when the zeros of 4>(B) are not all distinct. The 
equation 4>(B) = 0 is called the characteristic equation for the process. 

Note also that the roots of 4>(B) = 1 — <p x B — ••• — 4> p B p = 0 are the reciprocals to the 
roots of the polynomial equation in m, 

m p — 4> x m p -^ — ■■■ — <j) p = 0 

Hence, the stationarity condition that all roots of </>(£) = 0 must lie outside the unit circle, 
that is, be greater than 1 in absolute value, is equivalent to the condition that all roots of 
m p — </>] w p-1 — ••• — <J) p = 0 must lie inside the unit circle, that is, be less than 1 in absolute 
value. 

Since the series rt{B) = 4>(B) = 1 — 4>\B — ■■■ — 4> p B p is finite, no restrictions are re¬ 
quired on the parameters of an autoregressive process to ensure invertibility. 

y/ Weights. Since i//( B ) = 1 /</>( B j so that </>( 6)i//t B ) = 1, it readily follows that the weights 
i f/j for the AR( p) process satisfy the difference equation 

y>j = 4>\ Vj-\ + 4>iWj-2 + ••• + 4> P Vj-p j > 0 

with i// 0 = 1 and i/z y = 0 for j < 0, from which the weights i/Zy can easily be computed 
recursively in terms of the </> ; . In fact, as seen from the principles of linear difference 
equations as discussed in Appendix A4.1, the fact that the weights y/j satisfy the difference 
equation discussed earlier implies that they have an explicit representation in the form of 
y/j = Y'j-i K,G J i for the case of distinct roots. 
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3.2.2 Autocorrelation Function and Spectrum of Autoregressive Processes 

Autocorrelation Function. An important recurrence relation for the autocorrelation func¬ 
tion of a stationary autoregressive process is found by multiplying throughout in 

+-b 4pZ t -p + a i 

by z t _ k , for k > 0, to obtain 

z t-k z t ~ 4l^t-k^t-l A 4 > 2^t-k^t-2 “b "b <Pp^t-k^t-p "b ~t-k a t (3.2.2) 
Now, on taking expected values, we obtain the difference equation 

Yk = 4iYk-i + 4 2 Yk-2+-+4 p Yk-p k>0 (3.2.3) 

Note that the expectation E[z,_ k a t ] is zero for k > 0, since z t _ k can only involve the shocks 
aj up to time t — k. which are uncorrelated with a,. On dividing throughout in (3.2.3) by 
y 0 , we see that the autocorrelation function satisfies the same form of difference equation 

Pk = 4lPk-l + ( l ) 2Pk-2 + ■" + 4pPk-p k> 0 (3.2.4) 

Note that this is analogous to the difference equation satisfied by the process z t itself, but 
without the random shock input a t . 

Now suppose that this equation is written as 

4(B) p k = 0 

where c/)(B ) = 1 — — ••• — 4) p B p and B now operates on k and not t. Then, writing 

p 

4(B) = Pfd - G,B) 

i= 1 

the general solution for p k in (3.2.4) (see, e.g., Appendix A4.1) is 

p k = A l G k l +A 2 G k 2 + -+A p G k p (3.2.5) 

where G~ l .G ~ l ,..., G~ l are the roots of the characteristic equation 

4(B) = 1 - faB - (f) 2 B 2 - 4 p B p = 0 

or equivalently, Gq, G 2 ,... ,G p are the roots of m p — cf> l m p ~ l — ■■■ — <p p = 0. 

For stationarity, we require that \G t \ < 1. Thus, two situations can arise in practice if 
we assume that the roots G, are distinct. 

1. A root G, is real, in which case a term /t ( G^ in (3.2.5) decays to zero geometrically 
as k increases. We often refer to this as a damped exponential. 

2. A pair of roots G, and G ; are complex conjugates, in which case they contribute a 
term 


D k sin(2;r fk + F) 

to the autocorrelation function (3.2.5), which follows a damped sine wave, with dam¬ 
ping factor D = |G ; | = |G ; | and frequency / such that 2 jtf = cos -1 [ |Re(G,■)!/£>]. 
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In general, the autocorrelation function of a stationary autoregressive process will 
consist of a mixture of damped exponentials and damped sine waves. 


Autoregressive Parameters in Terms of the Autocorrelations: Yule- Walker Equations. If 
we substitute k = 1,2,..., pin (3.2.4), we obtain a set of linear equations for t/q, </> 2 ,..., 4> p 
in terms of p x , p 2 ,..., p p . that is, 

P\ =( t ) \ + ( t ) 2P\ +-f 4>pP p -\ 

p 2 =(j) l p { +4>2 H h fpPp-2 

(3.2.6) 

Pp = 4>\Pp-\ + ( l>2Pp-2 +-b r / J /j 


These are the well-known Yule-Walker equations (Yule, 1927; Walker, 1931). We obtain 
Yule-Walker estimates of the parameters by replacing the theoretical autocorrelations p k 
by the estimated autocorrelations r k . Note that, if we write 



~4>i 


~P\ 


1 

P 1 P2 ' 

" Pp -1 

<p = 

02 

P p = 

P2 

P P = 

Pi 

1 Px ■ 

" Pp-2 


0/-. 


,Pp . 


-Pp -1 

Pp-2 Pp -3 " 

•• 1 


the solution of (3.2.6) for the parameters (p in terms of the autocorrelations may be written 
as 




(3.2.7) 


Variance. When k = 0, the contribution from the term E[z t _ k a t ], on taking expectations 
in (3.2.2), is E[a 2 ] = a 1 , since the only part of z, that will be correlated with a, is the most 
recent shock, a t . Hence, when k = 0, 

Yo = 01X-1 + (hr-2 + + (PpY-p + h] 

On substituting y_ k = y k and writing y k = YoP k , the variance y 0 = a 2 , may be written as 


a 


2 

z 


1 - (/>!/?! - f 2 p 2 - (l> p p p 


(3.2.8) 


Spectrum. For the AR(p) process, qr(B) = tp Hyland 

<KB)= 1 -<M -hB 2 - fpB p 

Therefore, using (3.1.12), the spectrum of an autoregressive process is 


P(f) = 


2(7? 


11 - (j) l e- i2r[ f - 4> 2 e~ iAl[ f - (j} p e~ i2 P n f \ 1 


0 <f<\ 


(3.2.9) 


We now discuss two particularly important autoregressive processes, those of first and 
second order. 
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3.2.3 The First-Order Autoregressive Process 

The first-order autoregressive process is 

z, — + a t 

= a t + + <p^a t _2 + (3.2.10) 

where it has been shown in Section 3.2.1 that 4>\ must satisfy the condition —1 <</>,< 1 
for the process to be stationary. 

Autocorrelation Function. Using (3.2.4), the autocorrelation function satisfies the first- 
order difference equation 


Pk = ( t ) \Pk-\ k>Q (3.2.11) 

which, with p 0 = 1, has the solution 

Pk = (j)\ k> 0 (3.2.12) 

Since —1 < < 1, the autocorrelation function decays exponentially to zero when t/q is 

positive but decays exponentially to zero and oscillates in sign when </>, is negative. In 
particular, we note that 


Pi = </>i 

Variance. Using (3.2.8), the variance of the process is 

s.-±- 

1-Pi</>1 

i-0; 

on substituting p x = tp l 


Spectrum. Finally, using (3.2.9), the spectrum is 

2(7 2 

P(f) 


|1 


2(7? 


1 + 0 1 — 2c/) l cos(2a:/) 


0 — / — 5 


(3.2.13) 


(3.2.14) 


(3.2.15) 


Example. Figure 3.1 shows realizations from two AR(1) processes with (f> ] = 0.8 and 
0! = —0.8, and the corresponding theoretical autocorrelation functions and spectra. Thus, 
when the parameter has the large positive value (j> { = 0.8, neighboring values in the series 
are similar and the series exhibits marked trends. This is reflected in the autocorrelation 
function, which slowly decays to zero, and in the spectrum, which is dominated by low 
frequencies. When the parameter has the large negative value 4> l = —0.8, the series tends 
to oscillate rapidly, and this is reflected in the autocorrelation function, which alternates 
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AR(1) process with t/> = 0.8 AR(1) process with 0 = —0.8 



0 20 40 60 80 100 0 20 40 60 80 100 


Time Time 



0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 


Frequency Frequency 

FIGURE 3.1 Realizations from two first-order autoregressive processes and their corresponding 
theoretical autocorrelation functions and spectral density functions. 

in sign as it decays to zero, and in the spectrum, which is dominated by high frequencies. 
Figure 3.1 was generated in R and can be reproduced as follows: 

>library(TSA) 

>set.seed(12345) 

>par(mfrow=c(3,2)) 

>plot(arima.sim(list(order=c(1,0,0),ar = 0.8), n=100),ylab= 
expression(z[t]),main=expression("AR(1) process with "*phi*"=0.8")) 
>plot(arima.sim(list(order=c(1,0,0),ar = -0.8), n=100), ylab= 
expression(z[t]),main=expression("AR(1) process with "*phi*"=-0.8")) 
>plot(ARMAacf(ar=0.8,ma=0,15)[-1],type="h",ylab="ACF",xlab="lag") 
>abline(h=0) 

>plot(ARMAacf(ar=-0.8,ma=0,15)[-1],type="h",ylab="ACF",xlab="lag") 
>abline(h=0) 

>ARMAspec(model=list(ar=0.8),freq=seq(0,0.5,0.001),plot=TRUE) 

>ARMAspec(model=list(ar=-0.8),freq=seq(0,0.5,0.001),plot=TRUE) 

3.2.4 Second-Order Autoregressive Process 

Stationarity Condition. The second-order autoregressive process can be written as 


z t ~ l + $2 z t—2 + a t 


(3.2.16) 
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4> t 


FIGURE 3.2 Typical autocorrelation and partial autocorrelation functions p k and ij> kk for various 
stationary AR(2) models (Source: Stralkowski, 1968). 

For stationarity, the roots of 

4>(B)= 1 -4> 2 B 2 = 0 (3.2.17) 

must lie outside the unit circle, which implies that the parameters </>j and </> 2 must lie in the 
triangular region 

02 + 0i < 1 

02 — 0! < 1 (3.2.18) 

-1 < 02 < 1 

as shown in Figure 3.2. 

Autocorrelation Function. Using (3.2.4), the autocorrelation function satisfies the second- 
order difference equation 


Pk — 0lPfc-l + QlPk-l k> 0 


(3.2.19) 


with starting values p Q = 1 and p, = 0j/(I — <p 2 ). From (3.2.5), the general solution to this 
difference equation is 


p k =A l G\+A 2 G\ 

Gid - G\)G\ - G 2 (l - Gj)G* 
“ (G 1 -G 2 )(1 + G 1 G 2 ) 


(3.2.20) 


where G~ l and G~ l are the roots of the characteristic equation 4>(B) = 0. When the 
roots are real, the autocorrelation function consists of a mixture of damped exponentials. 
This occurs when </>j + 4 <p 2 > 0 and corresponds to regions 1 and 2, which lie above the 
parabolic boundary in Figure 3.2. Specifically, in region 1, the autocorrelation function 
remains positive as it damps out, corresponding to a positive dominant root in (3.2.20). In 
region 2, the autocorrelation function alternates in sign as it damps out, corresponding to a 
negative dominant root. 
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If the roots G i and Go are complex (</>“ + 4f/; 2 < 0), a second-order autoregressive 
process displays pseudoperiodic behavior. This behavior is reflected in the autocorrelation 
function, for on substituting G\ = De' ln ^ 0 and G 2 = De ~ ,2 ^° (0 < / 0 < in (3.2.20), 
we obtain 


Pit = 


D k sin(2;r/ 0 /c + F) 
sin F 


(3.2.21) 


We refer to this as a damped sine wave with damping factor £>, frequency / 0 , and phase 
F. These factors are related to the process parameters as follows: 


D = |G,| = \[—^2 (3-2.22) 

where the positive square root is taken, 

Re(G,) </>, 

cos(2^/ 0 ) = —-2- = —(3.2.23) 
D 2sJ^ 2 

1 + D 2 

tan F = | ^ tan(2/r/ 0 ) (3.2.24) 

Again referring to Figure 3.2, the autocorrelation function is a damped sine wave in 
regions 3 and 4, the phase angle F being less than 90° in region 4 and lying between 90° 
and 180° in region 3. This means that the autocorrelation function starts with a positive 
value throughout region 4 but always switches sign from lag 0 to lag 1 in region 3. 


Yule-Walker Equations. For the AR(2) model, the Yule-Walker equations become 


P 1 — 01 + 02/h 
Pi = 0t Pi + 02 

which, when solved for <p i and <p 2 , give 

Fi(l - Pi) 


(3.2.25) 


01 = 


02 _ 


P2~P\ 


(3.2.26) 


These equations can also be solved to express p x and p 2 in terms of </q and <fi 2 to give 

01 


Pi = 


1 - 02 


P2 — 02 + 


0? 


1-02 


(3.2.27) 


which provide the starting values for the recursions in (3.2.19). Expressions (3.2.20) and 
(3.2.21) are useful for explaining the different patterns for p k that may arise in practice. 
However, for computing the autocorrelations of an AR(2) process, it is simplest to make 
direct use of the recursions implied by (3.2.19). 

Using the stationarity condition (3.2.18) and the expressions for p x and p 2 in (3.2.27), 
it can be seen that the admissible values of p\ and for a stationary AR(2) process, must 
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FIGURE 3.3 Admissible regions for (a) (j > l , 0 2 and (b) , p 2 , for a stationary AR(2) process. 


lie in the region 


-1 < Pi < 1 

p\ < \(P2 + !) 

The admissible region for the parameters 0 1 and 0 2 is shown in Figure 3.3(a), while Figure 
3.3(b) shows the corresponding admissible region for p\ and p 2 . 

Variance. From (3.2.8), the variance of the AR(2) process is 



Z l-Pl01“P202 
1 - 02 


1 + 02 (1 - </) 2 ) 2 _ 


(3.2.28) 
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Spectrum. From (3.2.9), the spectrum is 


p(f) 


11 - 4^e~ i2K f - foe- 14 *?] 2 
2 a 2 

_ a _ 

1 + 4> 2 + 4> 2 — 20 x (l — </> 2 ) cos(2;r/) - 2</> 2 cos(47r/) 


o</<i 

(3.2.29) 


The spectrum also reflects the pseudoperiodic behavior that the series exhibits when the 
roots of the characteristic equation are complex. For illustration. Figure 3.4(a) shows 70 
values of a series generated from the AR(2) model 

z, = 0.75Zf_j - 0.50 z,_ 2 + a t 

Figure 3.4(b) shows the corresponding theoretical autocorrelation function. The roots of 
the characteristic equation 

1 - 0.755+ 0.5R 2 = 0 


are complex, so that the pseudoperiodic behavior observed in the series is to be expected. 
We clearly see this behavior reflected in the theoretical autocorrelation function of Figure 
3.4(b), the average apparent period being about 6. The damping factor D and frequency 


(a) Simulated AR(2) process 





FIGURE 3.4 (a) Time series generated from a second-order autoregressive process z, = 0.75 - 

0.50 z,_ 2 + a,, along with (b) the theoretical autocorrelation function, and (c) the spectral density 
function for the same process. 






64 LINEAR STATIONARY MODELS 


/ 0 , from (3.2.22) and (3.2.23), are 

D=y/olo = OJ\ 

Thus, the fundamental period of the autocorrelation function is 6.2. In addition, the theoret¬ 
ical spectral density function in Figure 3.4(c) shows that a large proportion of the variance 
of the series is accounted for by frequencies in the neighborhood of / 0 . 

Figure 3.4 was generated in R using the following commands: 


> library(TSA) 

> ar.acf=ARMAacf(model = list(ar=c(0.75,-0.5))) 

> ar.spec=ARMAspec(model=list(ar=c(0.75,-0.5),freq=seq(0,0.5,0.0005))) 

> layout(matrix(c(1,1,2,3),2,2,byrow=TRUE)) 

> plot(arima.sim(list(order=c(2,0,0),ar=c(0.75,-0.5)), n=70), ylab= 

expression(z[t]),xlab="Time",main=("Simulated AR(2) process")) 

> plot(ar.acf, main="b") 

> plot(ar.spec, main="c") 


3.2.5 Partial Autocorrelation Function 

In practice, we typically do not know the order of the autoregressive process initially, 
and the order has to be specified from the data. The problem is analogous to deciding on 
the number of independent variables to be included in a multiple regression. The partial 
autocorrelation function is a tool that exploits the fact that, whereas an AR(p) process has 
an autocorrelation function that is infinite in extent, the partial autocorrelations are zero 
beyond lag p. 

The partial autocorrelations can be described in terms of p nonzero functions of the 
autocorrelations. Denote by <p k j the yth coefficient in an autoregressive representation of 
order k, so that <p kk is the last coefficient. From (3.2.4), the <p k j satisfy the set of equations 

Pj = 4>kiPj-\ + + <l>k(k-i)Pj-k+i + 4>kkPj-k y = l,2, ...,k (3.2.30) 

leading to the Yule-Walker equations (3.2.6), which may be written as 


1 Pi Pi 
Pi 1 Pi 

Pk—l Pk-2 Pk -3 


Pk-l ^kl Pi 
Pk-2 ( t ) k2 _ P2 

1 J IM [ Pk 


(3.2.31) 


or 


- Pk 


(3.2.32) 
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Solving these equations for k = 1,2, 3,..successively, we obtain 


II 




1 

Pi 


4*22 ~ ' 

Pi 

P2 

P2 

1 

Pi 

1 


Pi 

1 



1 

Pi 

Pi 


Pi 

1 

P2 

033 = ' 

P2 

Pi 

P3 

1 

Pi 

P2 


Pi 

1 

Pi 


P2 

Pi 

1 


X ~P t 


(3.2.33) 


In general, for f> kk , the determinant in the numerator has the same elements as that in the 
denominator, but with the last column replaced by p k . The quantity <p kk , regarded as a 
function of the lag k , is called the partial autocorrelation function. 

For an AR (p) process, the partial autocorrelations <p kk will be nonzero for k < p and 
zero for k > p. In other words, the partial autocorrelation function of the AR(p) process has 
a cutoff after lag p. For the AR(2) process, partial autocorrelation functions <p kk are shown 
in each of the four regions of Figure 3.2. As a numerical example, the partial autocorre¬ 
lations of the AR(2) process z t = Q.15z t _^ — 0.50 z,_ 2 + a t considered in Figure 3.4 are 
011 = Pi = 0.5, </> 22 = (p 2 - p\)/( 1 - p\) = -0.5 = and f kk = 0, for all k > 2. 

The quantity <p kk is called the partial autocorrelation of the process { z,} at lag k, since it 
equals the partial correlation between the variables z, and z t _ k adjusted for the intermediate 
variables z t _ l ,z t _ 2 ,... ,z t _ k+l (or the correlation between z t and z t _ k not accounted for 
by z r-1 , z r _ 2 ,..., z t _ k+l ). Now, it is easy to establish from least squares theory that the 
values 4> kl , <p k2 ,..., 4> kk , which are the solutions to (3.2.31), are the regression coefficients 
in the linear regression of z t on z t _ j,..., z t _ k , that is, they are the values of coefficients 
b l ,...,b k , which minimize E[(z t — b 0 — h ( .z ( _ ; ) 2 ]. Hence, assuming for convenience 

that the process { z t } has mean zero, the best linear predictor, in the mean squared error 
sense, of z, based on z t _j , z t _ 2 ,, z t _ k+ \ is 

= fk-\,\ z t-\ + Qk-lpZ-t-l + •" + < / > /c-l,/c-l z r-fc+l 

whether the process is an AR or not. Similarly, the best linear predictor of z t _ k based on 
the (future) values z r _j, z r _ 2 ,..., z t _ k+ \ is 

z t-k = 0fc-l,l z f-fc+l + 0/c—l.2 z f—fc+2 +-f 0/c—1,*—l z r—1 

Then, the lag k partial autocorrelation of {z t }, <p kk , can be defined as the correlation between 
the residuals from these two regressions on z t _ |,..., z t _ k+l , that is. 


<Pick = C01 ' r [ z r - z „ z,_ k - z,_ k \ 


(3.2.34) 
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TABLE 3.1 Estimated Partial Autocorrelation Function for the Chemical Yield Data in 
Figure 2.1 


k 

<Pkk 

k 

4>kk 

k 

4>kk 

i 

-0.39 

6 


ii 

0.14 

2 

0.18 

7 


12 

-0.01 

3 

0.00 

8 

0.00 

13 

0.09 

4 

-0.04 

9 

-0.06 

14 


5 

-0.07 

10 

0.00 

15 



As examples, we find that </> n = corr[z r , z,_| ] = p^, while 


</>22 — corr[z, — p l z t _ l , z t _ 2 — p\Z, t _ x \ 

r 2 ~ 2 P\Yi + pfr 0 _p 2 -p\ 

[Oo + p\vo~ 2 p\ n ) 2 ] 1/2 1 - p] 

which agrees with the results in (3.2.33) derived from the Yule-Walker equations. Higher 
order partial autocorrelations <p kk defined through (3.2.34) can similarly be shown to be 
the solution to the appropriate set of Yule-Walker equations. 

3.2.6 Estimation of the Partial Autocorrelation Function 

The partial autocorrelations may be estimated by fitting successively autoregressive models 
of orders 1,2, 3,... by least squares and picking out the estimates 4> 33 > of the 

last coefficient fitted at each stage. Alternatively, if the values of the parameters are not 
too close to the nonstationary boundaries, approximate Yule—Walker estimates of the 
successive autoregressive models may be employed. The estimated partial autocorrelations 
can then be obtained by substituting estimates for the theoretical autocorrelations in 
(3.2.30), to yield 

r j = 4>k\ r j-\ + ( bk2 r j-i + - + $k{k-ifj-k+i + 4>kk r j-k j = 1,2,..., k (3.2.35) 

and solving the resultant equations for k = 1,2, .... This can be done using a simple recur¬ 
sive method due to Levinson (1947) and Durbin (1960), which we describe in Appendix 
A3.2. However, these estimates obtained from (3.2.35) become very sensitive to rounding 
errors and should not be used if the values of the parameters are close to the nonstationary 
boundaries. 

3.2.7 Standard Errors of Partial Autocorrelation Estimates 

It was shown by Quenouille (1949) that on the hypothesis that the process is autoregressive 
of order p, the estimated partial autocorrelations of order p + 1, and higher, are approxi¬ 
mately independently and normally distributed with zero mean. Also, if n is the number of 
observations used in fitting, 

var [$kki ~ \ 


k > p + 1 
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Thus, the standard error (SE) of the estimated partial autocorrelation (j) kk is 

SE[<£**]=£[<£**]-- 7 : k>p+\ 

\]n 


(3.2.36) 


3.2.8 Calculations in R 

The estimation of the partial autocorrelation function is conveniently performed in R. 
For example, the command pacf(Yield) in the stats package gives the estimated partial 
autocorrelations shown in Table 3.1 for the chemical yield data plotted in Figure 2.1. 
An alternative is to use the command acf2() in the R package astsa. This command 
has the advantage that it produces plots of the autocorrelation and partial autocorrelation 
functions in a single graph. This allows easy comparison of the two functions, which will 
be useful for specifying a model for the time series. Figure 3.5 shows a graph of the 15 first 
autocorrelations and partial autocorrelations for the chemical yield data produced using 
this routine. The patterns of the two functions resemble those of an AR(1) process with 
a negative value of (/> i , or possibly an AR(2) process with a dominant negative root (see 
region 2 of Figure 3.2). Also shown in Figure 3.5 by dashed lines are the two SE limits 
calculated on the assumption that the process is white noise. Since <p 2 2 is the second biggest 
partial autocorrelation, the possibility that the process is AR(2) should be kept in mind. 


Series: Yield 




FIGURE 3.5 Estimated autocorrelation and partial autocorrelation functions for the chemical yield 
data in Series F. 
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The use of the autocorrelation and partial autocorrelation functions for model specification 
will be discussed more fully in Chapter 6. Figure 3.5 was generated using the following R 
commands: 

> library(astsa) 

> seriesF=read.table("SeriesF.txt,header=TRUE) 

> Yield=ts(seriesF) 

> acf2(Yield,15) 


3.3 MOVING AVERAGE PROCESSES 

3.3.1 Invertibility Conditions for Moving Average Processes 

We now derive the conditions that the parameters 0j, 0 2 . ■ ■■, 0 q must satisfy to ensure the 
invertibility of the MA(g) process: 

z, = a, - 6 x a t _ x - 0 q a,_ q 

= d -OiB - O q B«)a t 

= 6{B)a t (3.3.1) 

We have already seen in Section 3.1.3 that the first-order moving average process 


z t = (1 - 0 { B)a t 


is invertible if |0j | < 1; that is. 


K{B) = (\-e l BT l = Yj e [B j 

j=o 

converges on or within the unit circle. However, this is equivalent to saying that the root, 
B = of (1 — 8 l B) = 0, lies outside the unit circle. 

The invertibility condition for higher order MA processes may be obtained by writing 
z t = 9(B)a t as 

a t = 6~ l (B)z t 


Hence, if 


4 

0(B) = J](l - HjB) 

;=i 

where H ~ l ,.... H~ l are the roots of 6(B) = 0, then, on expanding in partial fractions, we 
obtain 


*(B) = 



Mi \ 

-HiBj 


which converges, or equivalently, the weights izj = — ^ l 9 i _ l M t H J . are absolutely 
summable, if \H- t \ < 1, for /' = 1,2,... ,q. It follows that the invertibility condition for 
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an MA(g) process is that the roots H, 1 of the characteristic equation 

6(B) = 1 - 6 X B - 0 2 B 2 - 6 q B q = 0 (3.3.2) 

lie outside the unit circle. From the relation 6(B)jt{B) = 1, it follows that the weights nj 
satisfy the difference equation 

K j = °\ K j-\ + e 2 Jl j-2 + - + OqXj-q J > 0 

with the convention that 7r 0 = — 1 and ttj = 0 for j < 0, from which the weights n 2 can 
easily be computed recursively in terms of the 0 ; . 

Note that since the series 

y/(B) = 6(B) = 1 -6 X B- 6 2 B 2 - 6 q B q 

is finite, no restrictions are needed on the parameters of the moving average process to 
ensure stationarity. 

3.3.2 Autocorrelation Function and Spectrum of Moving Average Processes 

Autocorrelation Function. The autocovariance function of an M A(c/) process is 
Yk = E[(a t - 6 x a,_ x - 6 q a,_ q )(a t _ k - 6 x a,_ k _ x - 6 q a t _ k _ q )] 

= ~^kE\a^_ k \ + + —F Qq-kQqE[rf_ q ] 

since the a t are uncorrelated, and y k = 0 for k > q. Hence, the variance of the process is 

r o = (l + 02 + 02 + ... + 0 2 )(7 2 (3.3.3) 

and 

f (-@k + Ql^k+l + ^k +2 +—F Oq-k^qi^l k = 1,2,... ,q 

Yk = < 

^0 k > q 

Thus, the autocorrelation function is 

~6 k + 6 X 6 k+x + ■■■ + 6 q _ k 6q 

--- k = 1 ,2,.... q 

Pk = \ 1 +6-+-+61 (3.3.4) 

0 k > q 

We see that the autocorrelation function of an MA(g) process is zero, beyond the order q 
of the process. In other words, the autocorrelation function of a moving average process 
has a cutoff after lag q. 

Moving Average Parameters in Terms of Autocorrelations. If p x , p 2 ,..., p q are known, 
the q equations (3.3.4) may be solved for the parameters 6 X , d 2 ,..., 6 q . However, unlike 
the Yule-Walker equations (3.2.6) for an autoregressive process, the equations (3.3.4) 
are nonlinear. Hence, except in the simple case where q = 1, which is discussed shortly, 
these equations have to be solved iteratively. Estimates of the moving average parameters 
may be obtained by substituting estimates r k for p k and solving the resulting equations. 
However, unlike the autoregressive estimates obtained from the Yule-Walker equations, 
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the resulting moving average estimates may not have high statistical efficiency. Neverthe¬ 
less, they can provide useful rough estimates at the model identification stage discussed 
in Chapter 6. Furthermore, they provide useful starting values for an iterative parameter 
estimation procedure, discussed in Chapter 7, which converges to the efficient maximum 
likelihood estimates. 

Spectrum. For the MA(^) process, 

V(B) = 0(B ) =1-0, B- 9 2 B 2 - 6 q B q 

Therefore, using (3.1.12), the spectrum of an MA(g) process is 

p(f) = 2 <j 2 J\ - 0 x e- a * f - 0 2 e~ iA * f - 0 q e~ i2qilf \ 2 0 < f <\ 

(3.3.5) 

We now discuss in greater detail the moving average processes of first and second order, 
which are of considerable practical importance. 

3.3.3 First-Order Moving Average Process 

We have already introduced the MA(1) process 

z t = d t — 0 1 a,_ 1 
= (1 -6 x B)a t 


and we have seen that 9 1 must lie in the range — 1 < 0 l < 1 for the process to be invertible. 
The process is, of course, stationary for all values of 9 1 . 


Autocorrelation Function. It is easy to see that the variance of this process equals 

r o = (i+0 2 y- 


The autocorrelation function is 


Pk = 


-°x 

l + 0j 

0 


k = 1 
k > 1 


(3.3.6) 


from which it is noted that p l must satisfy |p|| = |0il/(l + 0j) < -. Also, for k = 1, we 
find that 


P\9 ^ + 9i + Pi — 0 (3.3.7) 

with roots for 0j equal to 0j = (—1 ± ^1 — Ap 2 x )/{2p{). Since the product of the roots is 
unity, we see that if 0j is a solution, so is 9~ l . Furthermore, if 0 t satisfies the invertibility 
condition |0[| < 1, the other root 0“' will be greater than unity and will not satisfy the 
condition. For example, if p 1 = —0.4, the two solutions are 0j = 0.5 and 0j = 2.0. However, 
only the solution 9 l = 0.5 corresponds to an invertible model. 
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Spectrum. Using (3.3.5), the spectrum of the MA(1) process is 
p{f) = 2cj 2 a \\-0 l e- i2 * f \ 1 

= 2 ( 7 2 [l + d 2 -2d 1 cos(2 */)] 0 <f<\ (3.3.8) 

In general, when 0 ] is negative, p l is positive, and the spectrum is dominated by low 
frequencies. Conversely, when 0 ] is positive, p x is negative, and the spectrum is dominated 
by high frequencies. 


Partial Autocorrelation Function. Using (3.2.31) with p^ = —dj /(1 + dp and p k = 0, for 
k > 1, we obtain after some algebraic manipulation 


Qkk ~ 


- e\) 

l _ 0 2 (*+n 


Thus, 1I < Wi\ k , an d the partial autocorrelation function is dominated by a damped 
exponential. If p x is positive, so that d, is negative, the partial autocorrelations alternate 
in sign. If, however, p ] is negative, so that d] is positive, the partial autocorrelations are 
negative. From (3.1.15), it has been seen that the weights p for the MA(1) process are 
p = — d|. and hence since these are coefficients in the infinite autoregressive form of 
the process, it makes sense that the partial autocorrelation function <fi kk for the MA(1) 
essentially mimics the exponential decay feature of the weights p. 

We now note a duality between the AR(1) and the MA(1) processes. Thus, whereas the 
autocorrelation function of an MA(1) process has a cutoff after lag 1, the autocorrelation 
function of an AR(1) process tails off exponentially. Conversely, whereas the partial 
autocorrelation function of an MA(1) process tails off and is dominated by a damped 
exponential, the partial autocorrelation function of an AR(1) process has a cutoff after 
lag 1. It turns out that a corresponding approximate duality of this kind occurs in general in 
the autocorrelation and partial autocorrelation functions between AR and MA processes. 


3.3.4 Second-Order Moving Average Process 

Invertibility Conditions. The second-order moving average process is defined by 

z, = at-e^a,^ - 0 2 a t _ 2 
= (1 -0 l B-e 2 B 2 )a t 

and is stationary for all values of d, and do. However, it is invertible only if the roots of the 
characteristic equation 

\-6 x B-0 2 B 2 =0 (3.3.9) 


lie outside the unit circle, that is. 


do + d} < 1 
d 2 — dl < 1 
—1 < d 2 < 1 


(3.3.10) 
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These are parallel to conditions (3.2.18) required for the stcitionarity of an AR(2) 
process. 


Autocorrelation Function. Using (3.3.3), the variance of the process is 

70 = + 0 ] + 0 2 2 ) 

and using (3.3.4), the autocorrelation function is 

= -0i(l - o 2 ) 

Pl i + e\ + e\ 

_ 

Pl — -T-O 

i + e\ + e 2 2 

p k = 0 k > 2 


(3.3.11) 


Thus, the autocorrelation function has a cutoff after lag 2. 

It follows from (3.3.10) and (3.3.11) that the first two autocorrelations of an invertible 
MA(2) process must lie within the area bounded by segments of the curves 


Pl + P\ — - 0.5 

p 2 -p 1= - 0.5 (3.3.12) 

P\ = 4p 2 (l — 2pi) 

The invertibility region (3.3.10) for the parameters is shown in Figure 3.6(a) and the 
corresponding admissible region (3.3.12) for the autocorrelations in Figure 3.6(b). The latter 
shows whether a given pair of autocorrelations /q and p 2 is consistent with the assumption 
that the model is an MA(2) process. If they are consistent, the values of the parameters 
0 j and 0 2 can be obtained by solving the nonlinear equations (3.3.11). To facilitate this 
calculation, Chart C in the Collection of Tables and Charts in Part Five has been prepared 
so that the values of 0 ] and 0 2 can be read off directly, given p x and p 2 . 

Spectrum. Using (3.3.5), the spectrum of the MA(2) process is 

P(f) = 2o 2 a \ 1 - 6,e- a « f - 6 2 e-' 4 « f \ 2 

= 2fT“[l + e\ + el - 26>j(l - 6> 2 )cos(2 nf) - 26> 2 cos(4 nf)\ 

0 <f<\ (3-3-13) 

and is the reciprocal of the spectrum (3.2.29) of a second-order autoregressive process, 
apart from the constant 2 cta 

Partial Autocorrelation Function. The exact expression for the partial autocorrelation 
function of an MA(2) process is complicated, but it is dominated by the sum of two 
exponentials if the roots of the characteristic equation 1 — 0 l B — 6 2 Br = 0 are real, and 
by a damped sine wave if the roots are complex. Thus, it behaves like the autocorrelation 
function of an AR(2) process. The autocorrelation functions and partial autocorrelation 
functions for various values of the parameters within the invertible region are shown in 
Figure 3.7. Comparison of Figure 3.7 with Figure 3.2, which shows the corresponding 
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FIGURE 3.7 Autocorrelation and partial autocorrelation functions p k and (p kk for various MA(2) 
models. 
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autocorrelations and partial autocorrelations for an AR(2) process, illustrates the duality 
between the MA(2) and the AR(2) processes. 


Example. For illustration, consider the second-order moving average model 


z t = a, — 0.8o r _! + 0.5a t _ 2 

The variance of the process is y 0 = t 7 “(l + (0.8 ) 2 + (-0.5) 2 ) = 1.89(7 2 , and from (3.3.11) 
the theoretical autocorrelations are 


-0.8(1 - (-0.5)) 

1 + (0.8 ) 2 + (-0.5 ) 2 


- 1.20 

1.89 


-0.635 


_ -(-0.5) 
Pl 1.89 


0.265 


and p k = 0, for k > 2. The theoretical partial autocorrelations are obtained by solving 
(3.2.31) successively; the first several values are 0 n = p\ = —0.635, <po 2 = ipi — p 2 )/(l — 
p\) = —0.232 ,033 = 0.105, </>44 = 0.191, and 0 55 = 0.102. 

Figure 3.8 shows the autocorrelation and partial autocorrelation functions up to 
15 lags for this example. Note the partial autocorrelations <p kk display an approximate 
damped sinusoidal behavior with moderate rate of damping, similar to the behavior 
depicted for region 4 in Figure 3.7. This is consistent with the fact that the roots of 
0(B) = 0 are complex with modulus (damping factor) D = \/().5 ~ 0.71 and frequency 
/ 0 = cos _1 (0.5657)/(2;r) = 1/6.48 in this example. 

The autocorrelation and partial autocorrelation functions shown in Figure 3.8 were 
generated using the function ARMAacf() in the R stats package. The commands needed 
to reproduce the graph are shown below. Note that the moving average parameters in the 
ARMAacf() function are again entered with their signs reversed since R uses positive signs 
in defining the moving average operator, rather than the negative signs used here. 


CD 

o 

LJ- 9 
O O 
< 

CO 

o 

2 4 6 8 10 12 14 


(a): ACF 


Lag 


(b): PACF 



FIGURE 3.8 (a) Autocorrelation function and (b) partial autocorrelation function for the MA(2) 

model z t = a t — O.Sa,., + 0.5a ( _ 2 . 
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> ACF=ARMAacf(ar=0,ma=c(-0.8,+0.5),lag.max=15,pacf=FALSE) [-1] 

> PACF=ARMAacf(ar=0,ma=c(-0.8,+0.5),lag.max=15,pacf=TRUE) 

> par(mfrow=c(2,1)) 

> plot(ACF,type='h',ylim=c(-0.8,0.6),xlab='lag',main=' (a) : ACF' ) 

> abline(h=0) 

> plot(PACF,type='h',ylim=c(-0.8,0.6),xlab='lag',main='(b):PACF') 

> abline(h=0) 

> ACF % Retrieves the autocorrelation coefficients 

> PACF % Retrieves the partial autocorrelation coefficients 

3.3.5 Duality Between Autoregressive and Moving Average Processes 

The previous sections have examined the properties of autoregressive and moving average 
processes and discussed the duality between these processes. As illustrated in Table 3.2 at 
the end of this chapter, this duality has the following consequences: 

1. In a stationary autoregressive process of order p, a, can be represented as a finite 
weighted sum of previous z’s, or z t as an infinite weighted sum 

z, = (f)~ l (B)a t 

of previous a’ s. Conversely, an invertible moving average process of order q, z t , can 
be represented as a finite weighted sum of previous a’ s, or a, as an infinite weighted 
sum 

9~ i (B)z, = a, 

of previous z’s. 

2. The finite MA process has an autocorrelation function that is zero beyond a certain 
point, but since it is equivalent to an infinite AR process, its partial autocorrelation 
function is infinite in extent and is dominated by damped exponentials and/or damped 
sine waves. Conversely, the AR process has a partial autocorrelation function that is 
zero beyond a certain point, but its autocorrelation function is infinite in extent and 
consists of a mixture of damped exponentials and/or damped sine waves. 

3. For an autoregressive process of finite order p , the parameters are not required to 
satisfy any conditions to ensure invertibility. However, for stationarity, the roots of 
4>(B) = 0 must lie outside the unit circle. Conversely, the parameters of the MA 
process are not required to satisfy any conditions to ensure stationarity. However, for 
invertibility, the roots of 9(B) = 0 must lie outside the unit circle. 

4. The spectrum of a moving average process has an inverse relationship to the spectrum 
of the corresponding autoregressive process. 


3.4 MIXED AUTOREGRESSIVE-MOVING AVERAGE PROCESSES 
3.4.1 Stationarity and Invertibility Properties 

We have noted earlier that to achieve parsimony it may be necessary to include both 
autoregressive and moving average terms. Thus, we may need to employ the mixed ARMA 
model 


Z, = <Ml-l + - + 4>pZ t -p + a t~ Vr-l- d q a t-q 


(3.4.1) 





76 LINEAR STATIONARY MODELS 


that is, 

(1-0,5- 0 2 5 2 - <p p B p )z t = (1 - 0,5 - 0 2 5 2 - O q B q )a t 

or 


< p(B)z t = 0(B)a t 

where 0(5) and 9(B) are polynomial operators in B of degrees p and q. 

We subsequently refer to this process as an ARMA(p, q) process. It may be thought of 
in two ways: 

1. As a pth-order autoregressive process 

0(5)z f = e, 

with e t following the gth-order moving average process e t = 9(B)a t . 

2. As a c/th-order moving average process 

z, = 0(B)b, 

with b t following the pth-order autoregressive process 4>(B)b, = a, so that 
0(5)z, = 6>(5)0(5)6, = 9(B)a, 

It is obvious that moving average terms on the right of (3.4.1) will not affect the earlier 
arguments, which establish conditions for stationarity of an autoregressive process. Thus, 
0(5)z r = 9(B)a t will define a stationary process provided that the characteristic equation 
0(5) = 0 has all its roots outside the unit circle. Similarly, the roots of 9(B) = 0 must lie 
outside the unit circle if the process is to be invertible. 

Thus, the stationary and invertible ARMA (p,q) process (3.4.1) has both the infinite 
moving average representation 


z, = y/(B)a, = ^ Wja t -j 
j =o 

where i//( B ) = 0 -1 (5)0(5), and the infinite autoregressive representation 

00 

n(B)z t = z, - ^ jTjZ t _j = a, 

1=1 

where n(B) = 0 _1 (5)0(5), with both the ipj weights and the jij weights being absolutely 
summable. The weights y/j are determined from the relation 0(5)i//(5) = 9(B) to satisfy 

Wj = 010, —l + < /’20,-2 + "• + QpVj-p ~°i J > 0 

with i// 0 = 1, i ffj = 0 for j < 0, and 9j = 0 for j > q, while from the relation 0(B)k(B) = 
0(5) the Kj are determined to satisfy 

7ij = 9 l n j _ l + 9 2 jij_ 2 + - + 9 q jtj_ q + 0 j 


j > 0 
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with the kq = —\,jtj=Q for j < 0 , and <frj = 0 for j > p. From these relations, the y/j and 
rtj weights can readily be computed recursively in terms of the </>, and 0 ,- coefficients. 

3.4.2 Autocorrelation Function and Spectrum of Mixed Processes 

Autocorrelation Function. The autocorrelation function of the mixed process may be 
derived by a method similar to that used for autoregressive processes in Section 3.2.2. 
On multiplying throughout in (3.4.1) by z,_ k and taking expectations, we see that the 
autocovariance function satisfies the difference equation 

Yk = (PlYk-l + - + QpYk-p + Yza ( k ) - e iYza( k - 1 )- ° q Yza( k ~ <?) 

where y za (k) is the cross-covariance function between z and a and is defined by y za (k) = 
E[z t _ k a t ]. Since z t _ k depends only on shocks that have occurred up to time t — k through 
the infinite moving average representation z t _ k = y/(B)a t _ k = l f / j a t-k-j^ it follows 
that 


Yza( k ) = 


(° , 

l V-k° 2 a 


k> 0 
k < 0 


Hence, the preceding equation for y k may be expressed as 

Yk = ^lYk-l + - + (PpYk-p ~ vfokVo + e k+ l^i + •" + 9qW q -k) 

with the convention that 9 0 = -1. We see that this implies 


Yk ~ QiYk-i + 0 2Yk-2 + + $pYk-p k > q + 1 

and hence 

Pk = tiiPk-i + ( hPk-2 + - + QpPk-p k>q + 1 


(3.4.2) 


(3.4.3) 


or 


< P(B)p k =0 k > q + 1 

Thus, for the ARMA( p, q) process, there will be q autocorrelations p\,.... p q whose values 
depend directly on the choice of the q moving average parameters as well as on the 
p autoregressive parameters <pj. Also, the p values p q _ p+i ,..., p q provide the necessary 
starting values for the difference equation <p{B)p k = 0, where k > q + 1, which then entirely 
determines the autocorrelations at higher lags. If q — p < 0, the whole autocorrelation 
function pj, for j = 0 ,1,2,..., will consist of a mixture of damped exponentials and/or 
damped sine waves, whose nature is dictated by (the roots of) the polynomial (p(B) and the 
starting values. If, however, q — p > 0, there will be q — p + 1 initial values p 0 , p l ,, p q _ p , 
which do not follow this general pattern. These facts are useful in identifying mixed series. 

Variance. When k = 0, we have 

Y0 = QlYl + - + 4>pYp + <r„(l ~ - QqVq) 


(3.4.4) 
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which has to be solved along with the p equations (3.4.2) for k = 1,2,... p to obtain 

tt)>n. •••.)> 


Spectrum. Using (3.1.12), the spectrum of the mixed ARMA( p, q) process is 


P(f) = 2 <7 2 


| 6 >(g ~ /2 ^)| 2 
|0( e —/2?r/)|2 


= 2 G 


2 

a 


|1 — 6 x e~ ilK f - O q e~ i2qM f | 2 

|1 - cp l e- i2,I f - <p p e-' 2pK f \ 2 


0 <f<\ 
2 


(3.4.5) 


Partial Autocorrelation Function. The mixed process <p{B)z, = 9{B)a t can be written as 

a, = 9~ 1 (B)cj)(B)z t 

where 6~ 1 {B) is an infinite series in B. Hence, the partial autocorrelation function of a 
mixed process is infinite in extent. It behaves eventually like the partial autocorrelation 
function of a pure moving average process, being dominated by a mixture of damped 
exponentials and/or damped sine waves, depending on the order of the moving average and 
the values of the parameters it contains. 


3.4.3 First Order Autoregressive First-Order Moving Average Process 

A mixed ARM A process of considerable practical importance is the ARMA(1,1) process 

z,-(j) x z 1 _ l =a t -e l a t _ x (3.4.6) 


that is, 


(i-<M)z r = (i-M) fl , 


We now derive some of its more important properties. 

Stationarity and Invertibility Conditions. First, we note that the process is stationary if 
— 1 < < 1, and invertible if — 1 < 0 l < 1. Hence, the admissible parameter space is the 

square shown in Figure 3.9(a). In addition, from the relations i//, = c/> , i// 0 — 9 i = </>j — 9 l 
and i //j = for j > 1, we find that the ipj weights are given by i pj = (t/ij — $i)</>j \ 

j > 1 , and similarly it is easily seen that jtj = ((/q — 9\)0^ ', j > 1 , for the stationary and 
invertible ARMA(1,1) process. 

Autocorrelation Function. From (3.4.2) and (3.4.4) we obtain 

Yo = 0iXi + c 2 a (! - 8\V\) 
n = '/'Ho - Q \°~ a 
Tic = 


k >2 
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1 


0 

*1 


-1 0 I 


(a) 



FIGURE 3.9 Admissible regions for (a) 4> l ,8 l and (b) /q,p 2 for a stationary and invertible 
ARMA(1,1) process. 


with i/q = </q — 0 1 . Hence, solving the first two equations for y 0 and jq, the autocovariance 
function of the process is 


r 0 = 

n = 


i + e\- 2^0, 

i ° a 

1 - (j)\ 

(1 2 
i 

1 - 


Yk = ^lYk-l 


k> 2 


(3.4.7) 


The last equation gives p k = </>[/?£_ j, k> 2, so that p k = pk> 1. Thus, the auto¬ 
correlation function decays exponentially from the starting value p j, which depends on 0\ 
and ip l . 2 This exponential decay is smooth if </q is positive and alternates if </q is negative. 
Furthermore, the sign of /q is determined by the sign of ((/q — 0\ ) and dictates from which 
side of zero the exponential decay takes place. 


2 By contrast, the autocorrelation function for the AR(1) process decays exponentially from the starting value 
Po = I- 
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FIGURE 3.10 Autocorrelation and partial autocorrelation functions p k and ij) kk for various 
ARMA(1,1) models. 


The first two autocorrelations may be expressed in terms of the parameters of the 
ARMA(l.l) process, as follows: 


(i-<Mt)W>t-0i) 

Pi = -}- 

1 + Q\ - 2 </> 1 6> 1 

p 2 = 0j p x 


(3.4.8) 


Using these expressions and the stationarity and invertibility conditions, it may be shown 
that P| and p 2 must lie in the region 


\Pi\ < \Px\ 

p 2 > Pi(2p, + 1) p< 0 (3.4.9) 

p 1 > Pi(2p l -l) p x > 0 


Figure 3.9(b) shows the admissible space for p x and p 2 , that is, it indicates which combi¬ 
nations of Pi and p 2 are possible for a mixed (1,1) stationary, invertible process. 


Partial Autocorrelation Function. The partial autocorrelation function of the mixed 
ARMA(1,1) process consists of a single initial value = p x . Thereafter, it behaves 
like the partial autocorrelation function of a pure MA(1) process and is dominated by a 
damped exponential. Thus, as shown in Figure 3.10, when 0 X is positive, it is dominated 
by a smoothly damped exponential that decays from a value of p x , with sign determined by 
the sign of (</>[ — 0 X ). Similarly, when 0 X is negative, it is dominated by an exponential that 
oscillates as it decays from a value of p 1? with sign determined by the sign of (</>j — d x ). 
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FIGURE 3.11 Theoretical autocorrelation and partial autocorrelation functions of an ARMA(1,1) 
process with </> = 0.8 and 9 = —0.6. 


Numerical Example. For numerical illustration, consider the ARMA(1,1) process, 

(1 -0.8B)z, = (1+0.6 B)a, 

so that 4> = 0.8 and 6 = —0.6. Further assuming erj = 1, we find from (3.4.7) and (3.4.8) 
that the variance of z t is y () = 6.444, and p\ = 0.893. Also, the autocorrelation function 
satisfies pj = 0.8 Pj_\.j > 2, so that = 0.893(0.8y _1 , for j > 2. 

The autocorrelation and partial autocorrelation functions are shown in Figure 3.11. 
The exponential decay in the autocorrelation function is clearly evident from the graph. 
The partial autocorrelation function also exhibits an exponentially decaying pattern that 
oscillates in sign due to the negative value of 6. The figure was generated in R using the 
commands included below. Notice again that the parameter 9, although negative in this 
example, is entered as + 0.6 since R defines the MA operator 9(B) as (1 + 9B) rather that 
(1 — 9B) as done in this text. 

> ACF=ARMAacf(ar=0.8,ma=0.6,20)[-1] 

> PACF=ARMAacf(ar=0.8,ma=0.6,20,pacf=TRUE) 

> win.graph(width=8,height=4) 

> par(mfrow=c(1,2)) 

> plot(ACF,type="h",xlab="lag");abline(h=0) 

> plot(PACF,type="h",xlab="lag");abline(h=0) 


3.4.4 Summary 

Figure 3.12 brings together the admissible regions for the parameters and for the auto¬ 
correlations p l . p 2 for AR(2), MA(2), and ARMA(1,1) processes, which are restricted to 
being both stationary and invertible. Table 3.2 summarizes the properties of mixed ARMA 
processes and brings together all the important results for autoregressive, moving average, 
and mixed processes, which will be needed in Chapter 6 to identify models for observed 
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AR(2) MA(2) ARMAfl, 1) 



FIGURE 3.12 Admissible regions for the parameters and p l ,p 2 f° r AR(2), MA(2), and 
ARMA(1,1) processes that are restricted to being both stationary and invertible. 


time series. In the next chapter, we extend the mixed ARMA model to produce models that 
can describe nonstationary behavior of the kind that is frequently met in practice. 


APPENDIX A3.1 AUTOCOVARIANCES, AUTOCOVARIANCE 
GENERATING FUNCTION, AND STATIONARITY CONDITIONS FOR A 
GENERAL LINEAR PROCESS 


Autocovariances. The autocovariance at lag k of the linear process 


with iy 0 = 1 is clearly 


2 I='Z VjHt-j 
j =0 


Yk E[z t z t+ k] 
= E 


t+k 
00 00 


IX ¥j¥h a t-j a t+k-h 
U=0 h=0 


= (7 aZ VjVj+k 
j =0 


using the property (3.1.2) for the autocovariance function of white noise. 


(A3.1.1) 


Autocovariance Generating Function. The result (A3.1.1) may be substituted in the au¬ 
tocovariance generating function 


(A3.1.2) 








TABLE 3.2 Summary of Properties of Autoregressive, Moving Average, and Mixed ARMA Processes 



Autoregressive Process 

Moving Average Processes 

Mixed Processes 

Model in terms of previous z's 

<p(B)z, = a, 

0~ x (B)z, = a, 

r‘(iJ) 0 (.B)z, = a, 

Model in terms of previous o's 

z, =0-'(B)o, 

z, = 0(B)a, 

i, = 

n weights 

Finite series 

Infinite series 

Infinite series 

V weights 

Infinite series 

Finite series 

Infinite series 

Stationarity condition 

Roots of (f>(B) = 0 lie 

Always stationary 

Roots of (p(B) = 0 lie out- 


outside the unit circle 


side the unit circle 

Invertibility condition 

Always invertible 

Roots of 0(B) = 0 lie outside 

Roots of 0(B) = 0 lie out- 



the unit circle 

side the unit circle 

Autocorrelation function 

Infinite (damped 

Finite 

Infinite (damped exponen- 


exponentials and/or 


tials and/or damped sine 


damped sine waves) 


waves after first q - p lags) 


Tails off 

Cuts off after lag q 

Tails off 

Partial autocorrelation function 

Finite 

Infinite (dominated by 

Infinite (dominated by 



damped exponentials and/or 

damped exponentials 



damped sine waves) 

and/or damped sine waves 
after first p — q lags) 


Cuts off after lag p 

Tails off 

Tails off 
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to give 


y(B) = <rl Z Z WjWj +k B k 

k=—oo j—0 
oo oo 

= Z 2 FjF J+k B k 
j= 0 *=-; 

since y/ h = 0 for /? < 0. Writing j + k = h, so that k = h— j, we have 

OO OO 

K 5 ) = Z Z VjVhB 1 '-- 1 

j—0 h=0 

OO 00 

= ° 2 aY J V h B h Y J V J B- J 

h =0 ;=0 

that is, 


y(5) = = a 2 a¥ (B) ¥ (F) (A3.1.3) 

which is the result (3.1.11) quoted in the text. 


Stationarity Conditions. If we substitute B = e~‘ 2lr ^ and F = B~ ] = e l2jI ^ in the auto¬ 
covariance generating function (A3.1.2), we obtain half the power spectrum. Hence, the 
power spectrum of a linear process is 


P(f) = 2o 2 aW (e- i2 *f) V (e i2 * f ) 

= 2a 2 a \ W (e- l2 ” f )\ 2 0 <f<\ 

It follows that the variance of the process is 


r 1/2 r 1/2 

<y\= \ P(f)df = 2o 2 l ¥ (e- i2 *f) ¥ (e i2 *f)df 

J o Jo 


(A3.1.4) 


(A3.1.5) 


Now if the integral (A3.1.5) is to converge, it may be shown (Grenander and Rosenblatt, 
1957) that the infinite series i//( B) must converge for B on or within the unit circle. 
More directly, for the linear process z t = Y.‘JLo x l / j a i-j' t ^ le con diti° n Iv'yl < of 
absolute summability of the coefficients i// 7 implies (see Brockwell and Davis, 1991; 
Fuller, 1996) that the sum Vj a t-j converges with probability 1 and hence represents 
a valid stationary process. 


APPENDIX A3.2 RECURSIVE METHOD FOR CALCULATING ESTIMATES 
OF AUTOREGRESSIVE PARAMETERS 

We now show how Yule-Walker estimates for the parameters of an ARC p + 1) model may 
be obtained from the estimates for an ARC/;) model fitted to the same time series. This 
recursive method of calculation, which is due to Levinson (1947) and Durbin (1960), can 
be used to approximate the partial autocorrelation function, as described in Section 3.2.6. 
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To illustrate the recursion, consider equations (3.2.35). Yule-Walker estimates are 
obtained for k = 2,3 from 

r 2 = 021^1 + 0 22 

0 = 021 +0220 (A3.2.1) 

and 


r 3 = 031 r 2 + 0 32 r 1 + 033 

r 2 = 03lO + 032 + 033 r 1 (A3.2.2) 

r I = 031 + 0320 + 033 r 2 


The coefficients </> 3 j and 0 32 may be expressed in terms of 0 33 using the last two equations 
of (A3.2.2). The solution may be written in matrix form as 


where 




R 2 


>t 1' 

.1 O. 


Now, (A3.2.3) may be written as 


031 

= R 2 _1 

>2 

- 0 33 R 2 1 

>l' 

032_ 


/l. 


y 2 . 


Using the fact that (A3.2.1) may also be written as 


021 

= 

> 2 ’ 

022_ 

2 

, r i. 


it follows that (A3.2.4) becomes 


031 

0 32 


021 

022 



that is, 


031 — 021 - 033022 
032 = 022 - 033021 


(A3.2.3) 


(A3.2.4) 


(A3.2.5) 


To complete the calculation of 0 31 and 0 32 , we need an expression for 0 33 . On substituting 
(A3.2.5) in the first of the equations (A3.2.2), we obtain 


033 - 


r 3 ~ 021 r 2 ~ 022 ,- l 
1 — ~ 022 r 2 


(A3.2.6) 
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Thus, the partial autocorrelation </> 33 is first calculated from c]> 1] and (p 12 , using (A3.2. 6 ), 
and then the other two coefficients, </> 31 and </> 32 , may be obtained from (A3.2.5). 

In general, the recursive formulas are 

4> P +ij = 4> Pj ~ <P p +u P+l (pp,p + i-j j = 1,2,... ,p (A3.2.7) 


( t ) P+ l.p+l 


r P +1 


- I L 4>pj r p+i-j 




(A3.2. 8 ) 


EXERCISES 

3.1 Write the following models in B notation: 

( 1 ) z t - O.Sz^j = a, 

(2) z t = a t — + 0.4a t _ 2 

(3) z t - 0.5z;_j = a, - \3a t _i + 0Aa t _ 2 

3.2 For each of the models of Exercise 3.1 and also for the following models, state 
whether it is (a) stationary or (b) invertible. 

(4) z r - 1.5z ( _! + 0.6z t _ 2 = a t 

(5) z t — z r _| = a, — O.Sa^j 

(6) z t — z r _i = a, — 1.3 a t _i + 0.3 a t _ 2 

3.3. For each of the models in Exercise 3.1, obtain: 

(a) The first four i// ; weights 

(b) The first four itj weights 

(c) The autocovariance generating function 

(d) The first four autocorrelations pj 

(e) The variance of z., assuming that cr = 1.0 

3.4. Calculate the first fifteen i// ; weights for each of the three models in Exercise 3.2 
using the function ARMAtoMA in R. See help(ARMAtoMA) for details. 

3.5. Classify each of the models (1) to (4) in Exercises 3.1 and 3.2 as a member of the 
class of ARM Alp, q) processes. 

3.6. (a) Write down the Yule-Walker equations for models (1) and (4) considered in 

Exercises 3.1 and 3.2. 

(b) Solve these equations to obtain pj and p 2 for the two models. 

(c) Obtain the partial autocorrelation function for the two models. 

3.7. Consider the first-order autoregressive model z, t = 0 Q + </>z,_[ + a t , where the con¬ 
stant 9 q is a function of the mean of the series. 

(a) Derive the autocovariances y k = E([z, — p][z t _ k — /,/]) for this series. 

(b) Calculate and plot the autocorrelation function for </; = 0.8 using the R command 
ARMAacf() ; see help(ARMAacf) for details. 
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(c) Calculate and plot the partial autocorrelation function for the same process. 

3.8. Consider the mixed ARMA(1,1) model z t — 4 >'.,_j = a, — 0a tX , where — 1 < </> < 1 
and E(z ,) is assumed to be zero for convenience. 

(a) Derive the autocovariances y k = E([z, - p][z t _ k — p ]) for this series. 

(b) Calculate and plot the autocorrelation function for </> = 0.9 and 9 = —0.3 using 
R (see Exercise 3.7). 

(c) Calculate and plot the partial autocorrelation function for the same process. 

3.9. For the AR(2) process z t — 1.0z ( _i + 0.5z,_ 2 = a,: 

(a) Calculate p x . 

(b) Using p 0 and p ] as starting values and the difference equation form for the 
autocorrelation function, calculate the values of p k for k = 2,... ,15. 

(c) Use the plotted autocorrelation function to estimate the period and damping factor 
of the autocorrelation function. 

(d) Check the values in (c) by direct calculation using the parameter values and the 
related roots G~ x and G~ l of <p(B) = 1 - 1.05 + 0.5 B 2 . 

3.10. (a) Plot the power spectrum g(f) of the autoregressive process of Exercise 3.9, and 

show that it has a peak at a period that is close to the period in the autocorrelation 
function. 

(b) Graphically, or otherwise, estimate the proportion of the variance of the series in 
the frequency band between f = 0.0 and / = 0.2 cycle per data interval. 

3.11. (a) Why is it important to factorize the autoregressive and moving average operators 

after fitting a model to an observed series? 

(b) It was shown by Jenkins (1975) that the number of mink skins z t traded annually 
between 1848 and 1909 in North Canada is adequately represented by the AR(4) 
model 

(1 - 0.825 + 0.225 2 + 0.285 4 )[ln(z,) - p] = a, 

Factorize the autoregressive operator and explain what the factors reveal about the 
autocorrelation function and the underlying nature of the mink series. The data for 
the period 1850-1911 are listed as Series N in Part Five of this book. Note that the 
roots of 4>(B) = 0 can be calculated using the R commond polyrootQ, where the 
autoregressive parameters are entered with their signs reversed; see help(polyroot) 
for details. 

3.12. Calculate and plot the theoretical autocorrelation function and partial autocorrelation 
function for the AR(4) model specified in Exercise 3.11(b). 
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LINEAR NONSTATIONARY MODELS 


Many empirical time series (e.g., stock price series) behave as though they had no fixed 
mean. Even so, they exhibit homogeneity in the sense that apart from local level, or perhaps 
local level and trend, one part of the series behaves much like any other part. Models that 
describe such homogeneous nonstationary behavior can be obtained by assuming that some 
suitable difference of the process is stationary. In this chapter, we examine the properties of 
the important class of models for which the rfth difference of the series is a stationary mixed 
autoregressive-moving average process. These models are called autoregressive integrated 
moving average (AR1MA) processes. 


4.1 AUTOREGRESSIVE INTEGRATED MOVING AVERAGE PROCESSES 

4.1.1 Nonstationary First-Order Autoregressive Process 

Figure 4.1 shows four time series that have arisen in forecasting and control problems. 
All of them exhibit behavior suggestive of nonstationarity. Series A, C, and D repre¬ 
sent “uncontrolled” outputs (concentration, temperature, and viscosity, respectively) from 
three different chemical processes. These series were collected to show the effect on these 
outputs of uncontrolled and unmeasured disturbances such as variations in feedstock and 
ambient temperature. The temperature Series C was obtained by temporarily disconnecting 
the controllers on the pilot plant involved and recording the subsequent temperature fluc¬ 
tuations. Both A and D were collected on full-scale processes, where it was necessary to 
maintain some output quality characteristic as close as possible to a fixed level. To achieve 
this control, another variable had been manipulated to approximately cancel out variations 
in the output. However, the effect of these manipulations on the output was accurately 
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FIGURE 4.1 Typical time series arising in forecasting and control problems. 


known in each case, so that it was possible to compensate numerically for the control ac¬ 
tion. That is, it was possible to calculate very nearly the values of the series that would have 
been obtained if no corrective action had been taken. It is these compensated values that are 
recorded here and referred to as the “uncontrolled” series. Series B consists of the daily 
IBM stock prices during a period beginning in May 1961. A complete list of all the series 
is given in the collection of time series at the end of this book. In Figure 4.1, 100 successive 
observations have been plotted from each series and the points joined by straight lines. 

There are an unlimited number of ways in which a process can be nonstationary. 
However, the types of economic and industrial series that we wish to analyze frequently 
exhibit a particular kind of homogeneous nonstationary behavior that can be represented 
by a stochastic model, which is a modified form of the autoregressive-moving average 
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FIGURE 4.2 Realization of the nonstationary first-order autoregressive process z t = 2z t _ { + a t 
with (j 2 = 1. 

a 


(ARMA) model. In Chapter 3, we considered the mixed ARMA model 

4>(B)z. t = 9(B)a t (4.1.1) 

with 4>(B) and 9(B) polynomial operators in B. of degree p and q, respectively. To ensure 
stationarity, the roots of <fi(B) = 0 must lie outside the unit circle. A natural way of obtaining 
nonstationary processes is to relax this restriction. 

To gain some insight into the possibilities, consider the first-order autoregressive model, 

(1 - 4>B)z t = a t (4.1.2) 

which is stationary for \tp\ < 1. Let us study the behavior of this process for <p = 2, a 
value outside the stationary range. Figure 4.2 shows a series z t generated by the model 
z, = 2z t _i + a t using a set of unit random normal deviates a t and setting z 0 = 0.7. It is 
seen that after a short induction period, the series “breaks loose” and essentially follows 
an exponential curve, with the generating a t ’s playing almost no further part. The behavior 
of series generated by processes of higher order, which violate the stationarity condition, is 
similar. Furthermore, this behavior is essentially the same whether or not moving average 
terms are introduced on the right of the model. 

4.1.2 General Model for a Nonstationary Process Exhibiting Homogeneity 

Autoregressive Integrated Moving Average Model. Although nonstationary models of the 
kind described above are of value to represent explosive or evolutionary behavior (such 
as bacterial growth), the applications that we describe in this book are not of this type. So 
far, we have seen that an ARMA process is stationary if the roots of <p(B) = 0 lie outside 
the unit circle, and exhibits explosive nonstationary behavior if the roots lie inside the unit 
circle. The only case remaining is that the roots of 4>(B) = 0 lie on the unit circle. It turns 
out that the resulting models are of great value in representing homogeneous nonstationary 
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time series. In particular, nonseasonal series are often well represented by models in which 
one or more of these roots are unity and these are considered in the present chapter 1 . 

Let us consider the model 


cp{B)z, = 9(B)a t (4.1.3) 

where cp(B) is a nonstationary autoregressive operator such that d of the roots of <p( B ) = 0 
are unity and the remainder lie outside the unit circle. Then the model can be written as 

cp(B)z t = 4>(B)( 1 - B) d z, = 9(B)a t (4.1.4) 

where <j>(B ) is a stationary autoregressive operator. Since \ /d z, = \ /d z t , for d > 1, where 
V = 1 — B is the differencing operator, we can write the model as 

qp(B)V d z., = 0(B)a t (4.1.5) 

Equivalently, the process is defined by the two equations 

4>(B)w t = 6{B)a t (4.1.6) 

and 

w, = S7 d z, (4.1.7) 

Thus, we see that the model corresponds to assuming that the dth difference of the series 
can be represented by a stationary, invertible ARMA process. An alternative way of looking 
at the process for d > 1 results from inverting (4.1.7) to give 

z, = S d w t (4.1.8) 

where S is the infinite summation operator defined by 

i 

Sx, = ^ x h = (1 + B + B 2 + •••);*:, 

h =—oo 

= (1 -B)~ x x t = V -1 x r 


Thus, 


S = (1 - B)~' = V -1 


The operator S 2 is similarly defined as 

S 2 x r = Sx t + Sx t _i + Sx t _ 2 + 

t i 

= Z Z x h = V+2B + 3B 2 + -)x t 

i =—oo h=—oo 


and so on for higher order d. Equation (4.1.8) implies that the process (4.1.5) can be 
obtained by summing (or “integrating”) the stationary process (4.1.6) d times. Therefore, 
we call the process (4.1.5) an autoregressive integrated moving average (ARIMA ) process. 


1 In Chapter 9, we consider models, capable of representing seasonality of period s, for which the characteristic 
equation has roots lying on the unit circle that are the sth roots of unity. 
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The ARIMA models for nonstationary time series, which were also considered earlier by 
Yaglom (1955), are of fundamental importance for forecasting and control as discussed 
by Box and Jenkins (1962, 1963, 1965, 1968a, 1968b, 1969) and Box et al. (1967a). 
Nonstationary processes were also discussed by Zadeh and Ragazzini (1950), Kalman 
(1960), and Kalman and Bucy (1961). An earlier procedure for time series analysis that 
employed differencing was the variate difference method (see Tintner (1940) and Rao and 
Tintner (1963)). However, the motivation, methods, and objectives of this procedure were 
quite different from those discussed here. 

Technically, the infinite summation operator S = (1 — B ) -1 in (4.1.8) cannot actually 
be used in defining the nonstationary ARIMA processes, since the infinite sums involved 
will not be convergent. Instead, we can consider the finite summation operator S m for any 
positive integer m, given by 


i _ nm 

S m = (l + B + B 2 + ... + B m ~ x ) = i—2- 

1 — D 

Similarly, the finite double summation operator can be defined as 

m— 1 m —1 

S ( m = Yj 2 B = (1 + 2B + 3B2 + ■" + 

j =0 t=j 

_ 1 - B m - mB' n ( 1 - B) 

= (1 -Bf 

(2) 

since (1 — B)Sf~ =S m — mB m , and so on. Then the relation between an integrated 
ARMA process z, with d = 1, for example, and the corresponding stationary ARMA 
process w t = (1 — B)z t , in terms of values back to some earlier time origin k < t, can be 
expressed as 


S,_ k 1 

Z, = - -W, = --( W. + W ,_I + + W k+ 1) 

' J _ Qt-k 1 J _ Qt-k K ' ' 1 K+l 

so that z t = w t + w t _\ + ■■■ + w k+ j + z, k can be thought of as the sum of a finite num¬ 
ber of terms from the stationary process w plus an initializing value of the process z 
at time k. Hence, in the formal definition of the stochastic properties of a nonstationary 
ARIMA process as generated in (4.1.3), it would typically be necessary to specify initializ¬ 
ing conditions for the process at some time point k in the finite (but possibly remote) past. 
However, these initial condition specifications will have little effect on most of the im¬ 
portant characteristics of the process, and such specifications will for the most part not be 
emphasized in this book. 

As mentioned in Chapter 1, the model (4.1.5) is equivalent to representing the process z t 
as the output from a linear filter (unless d = 0, this is an unstable linear filter), whose input 
is white noise a t . Alternatively, we can regard it as a device for transforming the highly 
dependent , and possibly nonstationary process z t , to a sequence of uncorrelated random 
variables a t , that is, for transforming the process to white noise. 

If in (4.1.5), the autoregressive operator (j>( B ) is of order p. the c/th difference is taken, 
and the moving average operator 9(B) is of order q , we say that we have an ARIMA model 
of order (p, d, q), or simply an ARIMAlp, c/, q) process. 
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Two Interpretations of the ARIMA Model. We now show that the ARIMA model is an 
intuitively reasonable model for many time series that occur in practice. First, we note that 
the local behavior of a stationary time series is heavily dependent on the level of z t . This 
is to be contrasted with the behavior of series such as those in Figure 4.1, where the local 
behavior of the series appears to be independent of its level. 

If we are to use models for which the behavior of the process is independent of its level, 
we must choose the autoregressive operator cp(B) such that 

cp(B)(z t + c) = <p(B)z, 
where c is any constant. Thus q>(B) must be of the form 

cp(B) = 0,CB)(1 - B) = 0j(5)V 

Therefore, a class of processes having the desired property will be of the form 

</>i (B)w t = 9(B)a t 

where w t = Vz r = Vz r . Required homogeneity excludes the possibility that w t should 
increase explosively. This means that either <pfB) is a stationary autoregressive operator 
or (B) = 0 2 (.B)(1 — -B), so that 0 2 (B)w; r = 9(B)a t . where now w t = V 2 z,. In the latter 
case, the same argument can be applied to the second difference, and so on. 

Eventually, we arrive at the conclusion that for the representation of time series that 
are nonstationary but nevertheless exhibit homogeneity, the operator on the left of (4.1.3) 
should be of the form <f>{ B)V d , where </>( B ) is a stationary autoregressive operator. Thus, 
we are led back to the model (4.1.5). 

To approach the model from a somewhat different viewpoint, consider the situation 
where d = 0 in (4.1.4), so that </>( B)z t = 0(B)a t . The requirement that the zeros of <p(B) lie 
outside the unit circle would ensure not only that the process z t was stationary with mean 
zero, but also that Vz f , V 2 z f , V 3 z r ,... were each stationary with mean zero. Figure 4.3(a) 
shows one kind of nonstationary series we would like to represent. This series is homoge¬ 
neous except in level, in that except for a vertical translation, one part of it looks much the 
same as another. We can represent such behavior by retaining the requirement that each of 
the differences be stationary with zero mean, but letting the level “go free.” We do this by 
using the model 


</>(-B)V z t = 9(B)a t 

Figure 4.3(b) shows a second kind of nonstationarity or fairly common occurrence. The 
series has neither a fixed level nor a fixed slope, but its behavior is homogeneous if we 
allow for differences in these characteristics. We can represent such behavior by the model 

0(B)V 2 z, = 9(B)a t 

which ensures stationarity and zero mean for all differences after the first and second but 
allows the level and the slope to “go free.” 
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FIGURE 4.3 Two kinds of homogeneous nonstationary behavior, (a) A series showing nonsta- 
tionarity in level such as can be represented by the model (p(B)V z, = 9(B)a t . (b) A series showing 
nonstationarity in level and in slope such as can be represented by the model cp(B)V 2 z t = 9(B)a r 


4.1.3 General Form of the ARIMA Model 

For reasons to be given below, it is sometimes useful to consider a slight extension of the 
ARIMA model in (4.1.5), by adding a constant term 9 0 , yielding the more general form 

cp(B)z, = cj)(B)V d z l =9 0 + 9(B)a t (4.1.9) 


where 


4>(B) = 1 -<$> x b-4> 2 b 2 - ( \> p B<> 

6(B) = 1 -9 X B-9 2 B 2 - 9 q B q 

In what follows: 

1. 4>(B) will be called the autoregressive operator, it is assumed to be stationary, that 
is, the roots of <p(B) = 0 he outside the unit circle. 

2. cp(B) = <fi(B)¥ d will be called the generalized autoregressive operator; it is a nonsta¬ 
tionary operator with d of the roots of <p( B) = 0 equal to unity, that is, d unit roots. 

3. 9(B) will be called the moving average operator, it is assumed to be invertible, that 
is, the roots of 9(B) = 0 he outside the unit circle. 

When d = 0, this model represents a stationary process. The requirements of stationarity 
and invertibility apply independently, and, in general, the operators <p(B) and 9(B) will not 
be of the same order. Examples of the stationarity regions for the simple cases of p = 1,2 
and the identical invertibility regions for q = 1,2 were given in Chapter 3. 
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Stochastic and Deterministic Trends. When the constant term 0 {) is omitted, the model 
(4.1.9) is capable of representing series that have stochastic trends, as typified, for example, 
by random changes in the level and slope of the series. In general, however, we may wish 
to include a deterministic function of time /(f) in the model. In particular, automatic 
allowance for a deterministic polynomial trend, of degree d, can be made by permitting 0 () 
to be nonzero. For example, when d = 1, we may use the model with 0 {] / 0 to represent 
a possible deterministic linear trend in the presence of nonstationary noise. Since, from 
(3.1.22), to allow 0 (} to be nonzero is equivalent to permitting 


E[w t ] = E[V d z,\ = n w 


0Q 

1 — 01 — 02 — ■" — ( l } p 


to be nonzero, an alternative way of expressing this more general model (4.1.9) is in the 
form of a stationary invertible ARMA process in w t = w t — p w . That is. 


4>{B)w t = 9(B)a t 


(4.1.10) 


Notice, when d = 1, for example, Vz r = w t = w t + p w implies that z t = z t + p w t + a, 
where a is an intercept constant and the process z t is such that Vz, = w t , which has zero 
mean. Thus, 0q # 0 allows for a deterministic linear trend component in z. t with slope 
h w = ej(\ - t/q-). 

In many applications, where no physical reason for a deterministic component exists, 
the mean of w can be assumed to be zero unless such an assumption is inconsistent with 
the data. In many cases, the assumption of a stochastic trend is more realistic than the 
assumption of a deterministic trend. This is of special importance in forecasting, since a 
stochastic trend does not require the series to follow the trend pattern seen in the past. In 
what follows, when d > 0, we will often assume that p w = 0, or equivalently, that 6 0 = 0, 
unless it is clear from the data or from the nature of the problem that a nonzero mean, or 
more generally a deterministic component of known form, is needed. 


Some Important Special Cases of the ARIMA Model. In Chapter 3, we examined some 
important special cases of the model (4.1.9), corresponding to the stationary situation, 
d = 0. The following models represent some special cases of the nonstationary model 
(d > 1), which seem to be common in practice. 

1. The (0, 1, 1) process: 


Vz, = a t — 6\a,_ { 

= (1-MH 


corresponding to p = 0, d = 1, q = 1, 4>{B) = 1, 0(B) = 1 —0 1 B. 

2. The (0, 2,2) process: 


V“z r — a, — 9\a t _i — 02 a t-2 
= (1 - 6\B - 0 2 B 2 )a t 

corresponding to p = 0, d = 2, q = 2, <p{B ) = 1, 6(B) = 1 — 0 1 B — 6 2 B 2 . 

3. The (1, 1, 1) process: 


Vz f — </>j Vzj_) = a t — 
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TABLE 4.1 Summary of Simple Nonstationary Models Fitted to Time Series of Figure 4.1 


Series 

Model 

Order of Model 

A 

Vz, = (1 - Q.l B)a t 

(0, 1 , 1) 

B 

Vz, = (1 + 0.1 B)a t 

(0, 1 , 1) 

C 

(1 - 0.8fl)Vz, = a t 

(1. 1,0) 

D 

Vz, = (1 - 0AB)a t 

(0, 1 , 1) 


or 


(1 - </> 1 i?)Vz, = (1 - 0[B)a t 


corresponding to p = 1, d = 1, q = 1, f(B) = 1 — 4>\B, 0(B) = ) — 0 X B. 

For the representation of nonseasonal time series (seasonal models are considered in 
Chapter 9), we rarely seem to meet situations for which either p, d, or q need to be greater 
than 2. Frequently, values of zero or unity will be appropriate for one or more of these 
orders. For example, we show later that Series A, B, C, and D given in Figure 4.1 are 
reasonably well represented 2 by the simple models shown in Table 4.1. 


Nonlinear Transformation of z. The range of useful applications of the model (4.1.9) 
widens considerably if we allow the possibility of transformation. Thus, we may substitute 
z' for z t , in (4.1.9), where z t is some nonlinear transformation of z t , involving one or 
more parameters X. A suitable transformation may be suggested by the application, or in 
some cases it can be estimated from the data. For example, if we were interested in the sales 
of a recently introduced commodity, we might find that the sales volume was increasing at 
a rapid rate and that it was the percentage fluctuation that showed nonstationary stability 
(homogeneity) rather than the absolute fluctuation. This would support the analysis of the 
logarithm of sales since 


V log(z r ) = log 



, Vz, 

= log 1 + — 

~r-l 


III 

Zt-1 


where S7z t /z, t _ l are the relative or percentage changes, the approximation holding if the 
relative changes are not excessively large. When the data cover a wide range and especially 
for seasonal data, estimation of the transformation using the approach of Box and Cox 
(1964) may be helpful (for an example, see Section 9.3.5). This approach considers the 
family of power transformations of the form z ^ = (zf — l)/X for X f 0 and z| 0) = log(z r ) 
for 4 = 0. 

Software to estimate the parameter X in the Box-Cox power transformation is available 
in the TSA and MASS libraries of R. For example, the function BoxCox.ar() in the TSA 
package finds a power transformation so that the transformed series is approximately a 
Gaussian AR process. 


-As is discussed more fully later, there are certain advantages in using a nonstationary rather than a stationary 
model in cases of doubt. In particular, none of the fitted models above assume that z, has a fixed mean. However, 
we show in Chapter 7 that it is possible in certain cases to obtain stationary models of slightly better fit. 
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4.2 THREE EXPLICIT FORMS FOR THE ARIMA MODEL 

We now consider three different “explicit” forms for the general model (4.1.9). Each of 
these allows some special aspect to be appreciated. Thus, the current value z t of the process 
can be expressed 

1. In terms of previous values of the z’s and current and previous values of the o’s, by 
direct use of the difference equation, 

2. In terms of current and previous shocks a t _j only, and 

3. In terms of a weighted sum of previous values z t _j of the process and the current 
shock a t . 

In this chapter, we are concerned primarily with nonstationary models in which \ ,d z r 
is a stationary process and d is greater than zero. For such models, we can, without loss of 
generality, omit p from the specification or equivalently replace z t by z t . The results of this 
chapter and the next will, however, apply to stationary models for which d = 0, provided 
that z t is then interpreted as the deviation from the mean p. 


4.2.1 Difference Equation Form of the Model 

Direct use of the difference equation permits us to express the current value z, of the process 
in terms of previous values of the z’s and of the current and previous values of the a’s. 
Thus, if 


cp(B) = cf>(B)(l — B) d = \ — cp { B — cp 2 B 2 - V p+d B p+d 

the general model (4.1.9), with d Q = 0, may be written as 

z t = <Pl z t-l + + ( Pp+d Z t-p-d ~ 0\ a t-l ~ •" - 0q a t-q + a t (4.2.1) 

For example, consider the process represented by the model of order (1,1,1) 

(1 - tj)B)(\ - B)z t = (1 -0B)a, 

where, for convenience, we drop the subscript 1 on </q and 0 1 . Then this process may be 
written as 


[1 - (1 + (j))B + 4>B 2 ]Zj = (1 - 0B)a t 


that is. 


z, = (1 + <p)z t _j - </>z r _ 2 + a t - (4.2.2) 

with <p , = 1 + (j> and (p 2 = —<fi in the notation introduced above. For many purposes, and, in 
particular, for calculating forecasts, the difference equation (4.2.1) is the most convenient 
form to use. 
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4.2.2 Random Shock Form of the Model 

Model in Terms of Current and Previous Shocks. As discussed in Chapter 3, a linear 
model can be written as the output z. t from the linear filter 


z, = a, + W\a t -i + ¥ 2 a t-2 + ■" 

00 

= a t + £ Vja t -j 

7=1 

= V (B)a, (4.2.3) 

whose input is white noise, or a sequence of uncorrelated shocks a t with mean 0 and 
common variance of It is sometimes useful to express the ARIMA model in this form, 
and, in particular, the i // weights will be needed in Chapter 5 to calculate the variance of the 
forecast errors. However, since the nonstationary ARIMA processes are not in statistical 
equilibrium over time, they cannot be assumed to extend infinitely into the past, and hence 
an infinite representation as in (4.2.3) will not be possible. But a related finite truncated 
form, which will be discussed subsequently, always exists. We now show that the i// weights 
for an ARIMA process may be obtained directly from the difference equation form of the 
model. 

General Expression for the t// Weights. If we operate on both sides of (4.2.3) with the 
generalized autoregressive operator cp(B). we obtain 

cp(B)z, = cp(B)p(B)a 1 

However, since cp(B)z, = 9(B)a t , it follows that 

cp(B)ig(B) = 9(B) (4.2.4) 

Therefore, the ip weights may be obtained by equating coefficients of B in the expansion 

(1 ~(p { B - cp p+d B p+d )(l + + i// 2 B 2 + •••) 

= (1 -6 X B - 9 q B q ) (4.2.5) 

Thus, we find that the i// ; weights of the ARIMA process can be determined recursively 
through the equations 


¥j = + ViWj-i + "■ + <Pp+d¥j- P -d ~ e j J > 0 

with i// 0 = 1, y/j , = 0 for j < 0, and 9j = 0 for j > q. We note that for j greater than the 
larger of p + d — 1 and q, the i// weights satisfy the homogeneous difference equation 
defined by the generalized autoregressive operator, that is, 

<p(B)Wj = - B) d ij/j = 0 (4.2.6) 

where B now operates on the subscript j. Thus, for sufficiently large j, the weights i// ; are 
represented by a mixture of polynomials, damped exponentials, and damped sinusoids in 
the argument j. 
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Example. For illustration, consider the (1,1,1) process (4.2.2), for which 

cp(B) = (1 - - B) 

= 1 - (1 + 4>)B + 4>B 2 


and 


6(B) = 1 -8B 


Substituting in (4.2.5) gives 

[1 - (1 + 4>)B + 4>B 2 ]( 1 + i y/ x B + y/ 2 B 2 + •••) = 1 -6B 


and hence the y/j satisfy the recursion y/j = (1 + 4>)Vj-\ — <py/j_T,j > 2 with yr Q = 1 and 
y/\ = (1 +</>) — 8. Thus, since the roots of cp(B) = (1 — 4>B)( 1 — B) = 0 are G~ l = 1 and 
G~ l = we have, in general, 

y/j = A 0 + ( 4 . 2 . 7 ) 

where the constants A 0 and A l are determined from the initial values y/ Q = A 0 + A l = 1 
and y/ x = A 0 + Aj</> = 1 + </> - 8 as 


A 0 - 


1 -8 
1-0 


A x - 


8 - 4 > 


Thus, informally, we may wish to express model (4.2.2) in the equivalent form 


z t = +A i^ )a t-j 

i =o 


(4.2.8) 


Since \<p\ < 1, the weights y/j tend to A 0 for large j, so that shocks a t _j , which entered 
in the remote past, receive a constant weight A 0 . However, the representation in (4.2.8) 
is strictly not valid because the infinite sum on the right does not converge in any sense; 
that is, the weights y/j are not absolutely summable as in the case of a stationary process. 
A related truncated version of the random shock form of the model is always valid, as we 
discuss in detail shortly. Nevertheless, for notational convenience, we will often refer to 
the infinite random shock form (4.2.3) of an ARIMA process, even though this form is 
strictly not convergent, as a simple notational device to represent the valid truncated form 
in (4.2.14), in situations where the distinction between the two forms is not important. 


Truncated Form of the Random Shock Model. For technical purposes, it is necessary and 
in some cases convenient to consider the model in a form slightly different from (4.2.3). 
Suppose that we wish to express the current value z, of the process in terms of the t — k 
shocks a t , a t _\,..., a k+ 1 , which have entered the system since some time origin k < t. This 
time origin k might, for example, be the time at which the process was first observed. 

The general model 


cp(B)z, = 6(B)a t 

is a difference equation with the solution 


(4.2.9) 


z t = c k(t ~ + 4( f - k ) 


(4.2.10) 
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A short discussion of linear difference equations is given in Appendix A4.1. We remind 
the reader that the solution of such equations closely parallels the solution of linear dif¬ 
ferential equations. The complimentary function C k (t — k) is the general solution of the 
homogeneous difference equation 


cp(B)C k (t-k) = 0 (4.2.11) 

In general, this solution will consist of a linear combination of certain functions of time. 
These functions are powers tf real geometric (exponential) terms G‘, and complex geomet¬ 
ric (exponential) terms D' sin(2;r f Q t + F), where the constants G, / 0 , and F are functions 
of the parameters (</),0) of the model. The coefficients that form the linear combinations 
of these terms can be determined so as to satisfy a set of initial conditions defined by the 
values of the process before time k + 1. The particular integral I k (t — k ) is any function 
that satisfies 


cp(B)I k (t - k) = 6(B)a t (4.2.12) 

It should be carefully noted that in this expression B operates on t and not on k. It is shown 
in Appendix A4.1 that this equation is satisfied for t — k > q by 

t-k-l 

I k (t-k)= Yj Fj a t-j = a t + F\ a t-\ + ■” + Vt-k-\ a k+\ t>k (4.2.13) 
1=0 

with I k (t — k) = 0, t < k. This particular integral I k (t — k), thus, represents the finite trun¬ 
cated form of the infinite random shock form (4.2.3), while the complementary function 
C k (t — k ) embodies the ‘ ‘initializing’ ’ features of the process z in the sense that C k (t — k) is 
already determined or specified by the time k + 1. Hence, the truncated form of the random 
shock model for the ARIMA process (4.1.3) is given by 

t-k-l 

z t = Yi FjOt-j + C k (t - k) (4.2.14) 

l=o 

For illustration, consider Figure 4.4. The above discussion implies that any observation 
z r can be considered in relation to any previous time k and can be divided up into two 
additive parts. The first part C k (t — k) is the component of z t , already determined at time 
k, and indicates what the observations prior to time k + 1 had to tell us about the value of 
the series at time t. It represents the course that the process would take if at time k, the 
source of shocks a, had been “switched off.” The second part, I k (t — k), represents an 
additional component, unpredictable at time k , which embodies the entire effect of shocks 
entering the system at time k. Hence, to specify an ARIMA process, one must specify 
the initializing component C k (t — k) in (4.2.14) for some time origin k in the finite (but 
possibly remote) past, with the remaining course of the process being determined through 
the truncated random shock terms in (4.2.14). 

Example. For illustration, consider again the example 


(1 -</>£)(!- B)z, = (1 -0B)a, 
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FIGURE 4.4 Role of the complementary function C k (t — k) and of the particular integral I k (t — k) 
in describing the behavior of a time series. 


The complementary function is the solution of the difference equation 


- $B){\ - B)C k {t - k) = 0 

that is, 

C k (t - k) = b ( Q k) + bf V~ fc 

where b (k \ b [k) are coefficients that depend on the past history of the process and, it will 
be noted, change with the origin k. 

Making use of the i/r weights (4.2.7), a particular integral (4.2.13) is 

t-k -1 

I k (t-k)= Y (A 0 + A^ J )a t _j 
j =o 

so that, finally, we can write the model (4.2.8) in the equivalent form 

t-k -1 

z, = b (k) + bf<H~ k + Yj (^o + MV)a t _j (4.2.15) 

i =o 

Note that since \(p\ < 1, if t — k is chosen sufficiently large, the term involving cj) , ~ k in this 
expression is negligible and may be ignored. 


Link Between the Truncated and Nontruncated Forms of the Random Shock Model. 
Returning to the general case, we can always think of the process with reference to some 
(possibly remote) finite origin k, with the process having the truncated random shock form 
as in (4.2.14). By comparison with the nontruncated form in (4.2.3), one can see that we 
might, informally, make the correspondence of representing the complementary function 
C k {t — k) in terms of the i // weights as 

00 

C k (t — k) = Y Vjat-j 

j=t-k 


(4.2.16) 









102 LINEAR NONSTATIONARY MODELS 

even though, formally, the infinite sum on the right of (4.2.16) does not converge. As 
mentioned earlier, for notational simplicity, we will often use this correspondence. 

In summary, then, for the general model (4.2.9), 

1. We can express the value z t of the process, informally, as an infinite weighted sum 
of current and previous shocks a,_j , according to 

00 

z, = X Vj a t-j = ¥(B)a t 

7=o 

2. The value of z, can be expressed, more formally, as a weighted finite sum of the t — k 
current and previous shocks occurring after some origin k, plus a complementary 
function C k (t — k). This finite sum consists of the first t — k terms of the infinite 
sum, so that 


t-k -1 

z,=C k (t-k)+ ^ Vjdt-j (4.2.17) 

j=o 

Finally, the complementary function C k (t — k ) can be taken, for notational conve¬ 
nience, to be represented as the truncated infinite sum, so that 

00 

C k (t-k)= ^ VjUt-j (4.2.18) 

j=t-k 

For illustration, consider once more the model 

(1 - 1 - B)z t = (1 - 0B)a t 

We can write z. t either, informally, as an infinite sum of the a t _j ’ s 

00 

z t = 2 (A o + 

7=0 

or, more formally, in terms of the weighted finite sum as 

t-k-l 

z, = C k (t -k)+ ^ (A q + 

7=0 

Furthermore, the complementary function can be written as 

c k (t - k) = bf + b ( f ( r k 

where b (k) and b {k \ which satisfy the initial conditions through time k, are 

, (k) _ z k ~ ( t >z k -1 ~ ® a k ,(k) _ ~ ( t>^ z k ~ z k-i> + 0 a k 
0 1 -(f) 1 1 -</> 
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The complementary function can also be represented, informally, as the truncated infinite 
sum 


C k (t -k)= ^ (A 0 + A i 4> J )a t _ j 

j=t-k 

from which it can be seen that b ik) and b (k> may be represented as 


= A I I a '-J 

V i=t—k 


i -e 

j=t—k 1 “ & j=t-k 

b <k) = A, f 4> J - (t ~ k> a t , = ^ 
1 \kk 1 


31 * 


j—if—k) 


t-j 


j=t—k 


Complementary Function as a Conditional Expectation. One consequence of the trun¬ 
cated form (4.2.14) is that for m > 0, 

C k (t -k) = C k _ m (t - k + m) + y t - k a k + \\> t _ M a k _ x + ■■• 

+ Wt-k+m-l a k-m+l (4.2.19) 

which shows how the complementary function changes as the origin k is changed. Now 
denote by E k [z t ] the conditional expectation of z t , at time k. That is the expectation given 
complete historical knowledge of the process up to, but not beyond time k. To calculate 
this expectation, note that 


E klaj] 


0 j > k 
aj j < k 


That is, standing at time k, the expected values of the future a 's are zero and of those that 
have happened already are their actually realized values. 

By taking conditional expectations at time k on both sides of (4.2.17), we obtain E k [z t ] = 
C k (t — k). Thus, for (t — k) > q. the complementary function provides the expected value 
of the future value z t of the process, viewed from time k and based on knowledge of 
the past. The particular integral shows how that expectation is modified by subsequent 
events represented by the shocks a k+ j, a k+9 ,... ,a t . In the problem of forecasting, which 
we discuss in Chapter 5, it will turn out that C k (t — k) is the minimum mean square error 
forecast of z, made at time k. Equation (4.2.19) may be used in “updating” this forecast. 


4.2.3 Inverted Form of the Model 

Model in Terms of Previous z’s and the Current Shock a t . We have seen in Section 3.1.1 
that the model 


z, = y/(B)a t 

may also be written in the inverted form 

i ls~ l (B)z t = a, 
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or 


z(B)z t 


1 

j =i 


7ijB J 


z t 


(4.2.20) 


Thus, z t is an infinite weighted sum of previous values of z, plus a random shock: 


Z 1 — K \ z t-\ + K 2 z t-2 + ■" + a t 


Because of the invertibility condition, the n weights must form a convergent series; that is, 
7r(B) must converge on or within the unit circle. 

General Expression for the it Weights. To derive the n weights for the general ARIMA 
model, we can substitute (4.2.20) in 


cp(B)z t = 6{B)a t 


to obtain 


( p(B)z ,, = 6(B)it(B)z t 

Hence, the n weights can be obtained explicitly by equating coefficients of B in 

cp(B) = 0(B)n(B) (4.2.21) 


that is, 


(1 -<PiB - <P P+d B p+d ) = (1 - 0 t B - 0 q B q ) 

X(1 -n l B-n 2 B 2 -) (4.2.22) 

Thus, we find that the Kj weights of the ARIMA process can be determined recursively 
through 


Kj — + 02 7r j-2 + •" + ®q n j-q + V>j j > 0 

with the convention /r 0 = — 1, Kj = 0 for j < 0, and cpj = 0 for j > p + d. It will be noted 
that for j greater than the larger of p + d and q, the n weights satisfy the homogeneous 
difference equation defined by the moving average operator 

0( B )TZj = 0 

where B now operates on j. Hence, for sufficiently large j, the n weights will exhibit 
similar behavior as the autocorrelation function (3.2.5) of an autoregressive process; that 
is, they follow a mixture of damped exponentials and damped sine waves. 

Another interesting fact is that if d > 1, the it weights in (4.2.20) sum to unity. This may 
be verified by substituting B = 1 in (4.2.21). Thus, <p( B) = tj)( B)( 1 — B) d is zero when 
B = 1 and 0( 1) f 0, because the roots of 6(B) = 0 lie outside the unit circle. Hence, it 
follows from (4.2.21) that tt( 1) = 0, that is, 

OO 

5>y = 1 

j =i 


(4.2.23) 
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Therefore, if d > 1, the process may be written in the form 


z, = z t _i(jc) + a, (4.2.24) 

where 

00 

j =i 

is a weighted average of previous values of the process. 

Example. We again consider, for illustration, the ARIMA(1,1,1) process: 

(1 -0£)(1 -B)z. t = (1 -9B)a, 

Then, using (4.2.21), 

it(B) = cp(B)0-\B ) = [!-(! + cj))B + </lB 2 ]( 1 +9B + 9 Z B 2 + ••■) 


so that 


ttj = 0 + (1 - 6) it 2 = (9 - </>)( 1 - 9) jij = (9- 0)( 1 - 9)9 j ~ 2 , j > 3. 

The first seven it weights corresponding to </> = —0.3 and 9 = 0.5 are given in Table 4.2. 
Thus, z, would be generated by a weighted average of previous values, plus an additional 
shock, according to 

z, = (0.2z f _j + 0.4z,_ 2 + 0.2z,_ 3 + 0.1z ? _4 + •••) + a t 

We notice, in particular, that the it weights die out as more and more remote values of z t _j 
are involved. This happens when —1 < 9 < 1, so that the series is invertible. 

We mention in passing that, for models fitted to actual time series, the convergent it 
weights usually die out rather quickly. Thus, although z, may be theoretically dependent 
on the remote past, the representation 


*1 = X 7r J z ‘~J + a < 
j =i 

will usually show that z f is dependent to an important extent only on recent past values 
z. t _j of the time series. This is still true even though for nonstationary models with d > 0, 
the i p weights in the “weighted shock” representation 

OO 

Z r = Z VjOt-j 
j =0 

do not die out to zero. What happens, of course, is that all the information that remote values 
of the shocks a t _j supply about z t is contained in recent values z t _ l ,z t _ 2 , ■■■ of the series. 
In particular, the expectation E k [z t ], which in theory is conditional on complete history of 
the process up to time k, can usually be computed to sufficient accuracy from recent values 
of the time series. This fact is particularly important in forecasting applications. 
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TABLE 4.2 First Seven k Weights for an ARIMA(1,1,1) Process with </; = —0.3, 0 = 0.5 


j 

1 

2 

3 

4 

5 

6 

7 

K i 

0.2 

0.4 

0.2 

0.1 

0.05 

0.025 

0.0125 


4.3 INTEGRATED MOVING AVERAGE PROCESSES 

A nonstationary model that is useful in representing some commonly occurring series is 
the (0, 1,1) process: 


Vz, = a, — 6a t _\ 

The model contains only two parameters, 0 and a 2 . Figure 4.5 shows two time series 
generated by this model from the same sequence of random normal deviates a t . For the first 
series, 6 = 0.6, and for the second, 9 = 0. Models of this kind have often been found useful 
in inventory control problems, in representing certain kinds of disturbances occurring in 
industrial processes, and in econometrics. We will show in Chapter 7 that this simple 
process can, with suitable parameter values, supply useful representations of Series A, B, 
and D shown in Figure 4.1. Another valuable model is the (0, 2, 2) process 

V 2 z r = a t — 9 l a t _ l — 0 2 a t _ 2 

which contains three parameters, 9 l ,9 2 , and a 2 . Figure 4.6 shows two series generated 
from this model using the same set of normal deviates. For the first series, the parame¬ 
ters (0j, 0 2 ) = (0,0) and for the second (0 l5 9 t) = (1.5, —0.8). The series tend to be much 
smoother than those generated by the (0, 1, 1) process. The (0, 2, 2) models are useful in 
representing disturbances (such as Series C) in systems with a large degree of inertia. Both 
the (0, 1, 1) and the (0, 2, 2) models are special cases of the class 

V d z t = 9(B)a t (4.3.1) 

We call these models integrated moving average (IMA) processes, of order (0, d, q), and 
consider their properties in the following section. 



FIGURE 4.5 Two time series generated from IMA(0, 1. 1) models. 
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FIGURE 4.6 Two time series generated from IMA(0, 2, 2) models. 


4.3.1 Integrated Moving Average Process of Order (0,1,1) 

Difference Equation Form. The IMA(0, 1, 1) process 

Vz, = (1 - 6B)a t - 1 < 9 < 1 

possesses useful representational capability, and we now study its properties in more detail. 
The model can be written in terms of the z’s and the a’s in the form 

z t = ~’t—\ + a t - 9a t _ x (4.3.2) 

Random Shock Form of Model. Alternatively, we can obtain z t in terms of the a ’ s alone by 
summing on both sides of (4.3.2). Before doing this, there is some advantage in expressing 
the right-hand operator in terms of V rather than B. Thus, we can write 

1 — 9B = (1 — 0)B + (1 — B) = (1 — 6)B + V = AB + V 

where A = l — 9, and the invertibility region in terms of A is defined by 0 < A < 2. Hence 

Vz ? = Adj_i + 

Relative to some time origin k < t, applying the finite summation operator S t _ k = 1 + B + 
... + = (1 - B t ~ k )/(] - B), we obtain 

(1 - B‘- k )z t = AS t _ k a,_i + (1 - B‘- k )a t (4.3.3) 

so that 

z, = a, + A(a t _ j + a t _ 2 + ••• + a k+l ) + (z k - 9a k ) (4.3.4) 

In comparison to z t = Ylj= fT* ^j a t-j + C k (t — k), the weights are y/ 0 = l,i ffj = 2for j > 1. 

(k) 

Also, the complementary function is C k (t — k) = z k — 9a k = b Q (a “constant” b 0 for each 
k), which is the solution of the difference equation (1 - B )C k (t — k) = 0. Moreover, in the 
infinite form z. t = a t + a t-j^ we may identify b^ with A Y.JL t _ k For this model, 

then, the complementary function is simply a constant (i.e., a polynomial in t of degree zero) 
representing the current “level'’ of the process and associated with the particular origin of 
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reference k. If the origin is changed from A: — 1 to k, then h {] is "updated” according to 

bf = b^ + Aa k 

(k) 

since using (4.3.2), b Q = z k + (2 - 1 )a k = z k _ x — 0a k _ 1 + Aa k . 

Inverted Form of Model. Finally, we can consider the model in the form 

n(B)z t = a t 


or equivalently, in the form 

00 

z t = ^j XjZ t _j +a, = z,_lM + a, 
j= 1 

where z t _fjt) is a weighted moving average of previous values of the process. 
Using (4.2.21), the it weights for the IMA(0, 1,1) process are given by 

(1 - 6B)k(B) = 1 - B 


that is, 


1 -6B 1 -6B 

= 1 - (1 - 6)(B + OB 2 + 0 1 B i + •••) 

so that 

nj = (i - 0)e J -' = 2(i - xy~ l j > 1 

Thus, the process may be written as 


z, = z,_\{X) + a, (4.3.5) 

The weighted moving average of previous values of the process 

00 

Vi(2) = 2 £(1 - 2y-V; (4.3.6) 

1=1 

is, in this case, an exponentially weighted moving average (EWMA). This term reflects the 
fact that the weights 

2 2(1 - 2) 2(1 - A) 2 2(1 - 2) 3 - 

fall off exponentially (i.e., as a geometric progression) as j increases. The weight function 
for an IMA(0, 1, 1) process, with 2 = 0.4 (or 6 = 0.6), is shown in Figure 4.7. 

Although the invertibility condition is satisfied for 0 < 2 < 2, in practice, we are most 
often concerned with values of 2 between zero and 1 (i.e., 0 < 6 < 1). We note that if 2 
had a value equal to 1, the weight function would consist of a single spike (ti ] = 1, jij = 0 
for j > 1). As the value 2 approaches zero, the exponential weights die out more and more 
slowly and the EWMA stretches back further into past values of the process. Finally, with 
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J -► 

FIGURE 4.7 The n weights for an IMA process of order (0, 1, 1) with A = 1 - 0 = 0.4. 


A = 0 and 9=1, the model (1 — B)z 1 = (1 — B)a t is equivalent to z t = 9 0 + a t , with 9 0 
being given by the mean of all past values. 

Since b (k) = z k - 6a k = z k+l - a k+l , or z k+ j = b {k) + a k+ j, on comparison with 

(k) 

(4.3.5) it follows that for this process, the complementary function b Q = C k (t — k ) in 
(4.3.4)is 

bf = z k (A) (4.3.7) 

an exponentially weighted average of values up to the origin k. In fact, (4.3.4) may be 
written as 


t-k -1 

z, = z k (A ) + A ^ a t _j + a, 
j =t 

We have seen that the complementary function C k (t — k) can be thought of as telling 
us what is known about the future value of the process at time t, based on knowledge of 
the past when we are standing at time k. For the IMA(0, 1, 1) process, this takes the 
form of information about the “level” or location of the process b (k1 = z k (A). At time 
k, our knowledge of the future behavior of the process is that it will diverge from this 
level in accordance with the “random walk” represented by A i a t-j + a r> whose 
expectation is zero and whose behavior we cannot predict. As soon as a new observation 
is available, that is, as soon as we move our origin to time k + 1, the level will be updated 
to b { Q k+l) = z k+l (A). 

Important Properties of the 1MA(0, 1,1) Process. Since the process is nonstationary, it 
does not vary in a stable manner about a fixed mean. However, the exponentially weighted 
moving average z t (A) can be regarded as measuring the local level of the process at 
time t. From its definition (4.3.6), we obtain the well-known recursion formula for the 
EWMA: 


z t (A) = Az t + (l -A)z t _fA) (4.3.8) 

This expression shows that for the IMA(0, 1, 1) model, each new level is arrived at by 
interpolating between the new observation and the previous level. If A is equal to unity, 
z t (A) = z, which would ignore all evidence concerning location coming from previous 
observations. On the other hand, if A had some value close to zero, Zi(A) would rely 
heavily on the previous value z t _fA ), which would have weight l — A. Only the small 
weight A would be given to the new observation. 
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Now consider the two equations 


z t = z t _ 
z,(X) = z,_ 

the latter being obtained by substituting (4.3.5) in (4.3.8) and is also directly derivable from 
(4.3.7). 

It was pointed out by Muth (1960) that the two equations (4.3.9) provide a useful way of 
thinking about the generation of the process. The first equation shows how, with the level 
of the system at z t _\(X), a shock a, is added at time t and produces the value z t . However, 
the second equation shows that only a proportion X of the shock is actually absorbed into 
the level and has a lasting influence, the remaining proportion 0 = 1 — X of the shock being 
dissipated. Now a new level z t (X) having been established by the absorption of a t , a new 
shock a r+l enters the system at time 1 + 1. Equations (4.3.9), with subscripts increased by 
unity, will then show how this shock produces z f+1 and how a proportion X of it is absorbed 
into the system to produce the new level z (+] (A), and so on. 

Equation (4.3.4) can be used to obtain variance and correlation features of the IMA(0, 1, 

1) process directly. For example, with reference to the origin k and treating the initializing 

(k) 

function b () as constant, we find that 

var[z,] = ff“[l +(t-k- 1)A 2 ] (4.3.10) 

which does not converge as t increases. We might also view this variance as, essentially, 
the variance of the difference z t — z fc , treating a k = 0 in (4.3.4). In particular, in the 
case of a random walk process, z t = z t _ 1 + a t , we have X = 1, and this variance function 
grows proportionally with t — k, whereas for more common situations with 0 < X < 1 (i.e., 
0 < 6 < 1) and especially for X close to zero, the variance function of z t — z k grows much 
more slowly with t — k. In addition, for s > 0, cov[z t , z t+s ] = (7 2 [A + (t — k — 1)A 2 ], which 
implies that corr[z r , z (+s ] will be close to 1 for t — k large relative to s (and X not close 
to zero). Hence, it follows that adjacent values of the process will be highly positively 
correlated, so the process will tend to exhibit rather smooth behavior (unless X is close to 
zero). 

The properties of the IMA(0, 1,1) process with deterministic drift 

Vz, = 9 q + (1 - 9 l B)a t 


[(A) + a t 
[(A) + 2a t 


(4.3.9) 


are discussed in Appendix A4.2. 


4.3.2 Integrated Moving Average Process of Order (0,2, 2) 
Difference Equation Form. The IMA(0, 2, 2) process 


V 2 z, = (1 - 0 X B - 9 2 B 2 )a t (4.3.11) 

can be used to represent series exhibiting stochastic trends (e.g., see Fig. 4.6), and we now 
study its general properties within the invertibility region: 

-i < e 2 < i o 2 + Q x < i e 2 -o l <\ 
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Proceeding as before, z t can be written explicitly in terms of z’s and c/’s as 
h = 2z f _, - z,_ 2 + a, - 0 x a t _ t - 0 2 a,_ 2 
Alternatively, we can rewrite the right-hand operator in terms of differences: 

1 -9 X B- 6 2 B 2 = (2 0 V + A ,)£ + V 2 

and on equating coefficients, we find expressions for the 0’s in terms of the 2’s, and vice 
versa, as follows: 


01 — 2 — Aq — 2] 2q — 1 -p 07 
d 2 = A Q — 1 Aj = 1 — 0 1 — 0 2 

The IMA(0, 2, 2) model may then be rewritten as 

V“z f = (2 qV + Af)a t _\ + V“a r 


(4.3.12) 


(4.3.13) 


There is an important advantage in using this form of the model, as compared with (4.3.11). 
This stems from the fact that if we set A j =0 in (4.3.13), we obtain 


Vz r = [1 -(1 - A Q )B]a t 


which corresponds to a (0, 1, 1) process, with 0 = 1 — 2 0 . However, if we set 0 2 = 0 in 
(4.3.11), we obtain 


V 2 z, = (1 - 0\B)a t 


As will be shown in Chapter 5, for a series generated by the (0, 2, 2) model, the optimal 
forecasts lie along a straight line, the level and slope of which are continually updated 
as new data become available. By contrast, a series generated by a (0, 1, 1) model can 
supply no information about slope but only about a continually updated level. It can be 
an important question whether a linear trend, as well as the level, can be forecasted and 
updated. When the choice is between these two models, this question turns on whether or 
not 2] in (4.3.13) is zero. 

The invertibility region for an IMA(0, 2, 2) process is the same as that given for an 
MA(2) process in Chapter 3. It may be written in terms of the 0’s and 2’s as follows: 

07 + 0j < 1 0 < 22 0 + < 4 

0 2 -0, < 1 2,>0 (4.3.14) 

-1 < 0 2 < 1 A q > 0 

The triangular region for the 0’s was shown in Figure 3.6 and the corresponding region for 
the 2’s is shown in Figure 4.8. 

Truncated and Infinite Random Shock Forms of Model. On applying the finite double 

( 2 ) 

summation operator ,v_,, relative to a time origin k. to (4.3.13), we find that 

[1 _ B r ~ k -(t- k)B’- k ( 1 - B)]z, = [ A Q (S,_ k -(t- k)B '~ k ) + A x sf\\a,_ x 

+ [1 - B'~ k -(t- k)B‘~ k (l - B)]a, 
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FIGURE 4.8 Invertibility region for parameters A 0 and A I of an IMA(0, 2, 2) process. 


Hence, we obtain the truncated form of the random shock model as 

Z, = + “r + C + bf{t - k) 

t-k-l t—k—1 

= ^ a t-j + ^ j a t-j + a t + C k (t — k) (4.3.15) 

y=i y=i 

So, for this process, the i// weights are 

Vo = 1 Vi = Uo + h) •" = Wo+^i) 

The complementary function is the solution of 

(1 - B) 2 C k (t -k) = 0 

that is, 

C k (t - k) = b ( Q k) + b\ k) (t - k) (4.3.16) 

which is a polynomial in (t — k) of degree 1 whose coefficients depend on the location of 
the origin k. From (4.3.15), we find that these coefficients are given explicitly as 

b Q k) = z k ~ (i - 4K 

b \ k> =z k - z k _ ] - (1 - A l )a k + (1 - A 0 )fln 

Also, by considering the differences — b^ ^ and b^ — b^~ l \ it follows that if the 
origin is updated from k — 1 to k. then b Q and b { are updated according to 

b W = b «-» + b f-» + Ao a k 

b<? = b<*- l ' + x iak 


(4.3.17) 
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We see that when this model is appropriate, our expectation of the future behavior of 
the series, judged from origin k, would be represented by the straight line (4.3.16), having 
location and slope b { ^. In practice, the process will, by time f, have diverged from this 
line because of the influence of the random component 

t-k-l l-k-l 

A o Z a t-j + A i Z ja '-j+ a > 

i =i j =i 

which at time k is unpredictable. Moreover, on moving from origin A: — 1 to origin k, the 
intercept and slope are updated according to (4.3.17). 

Informally, through (4.3.15) we may also obtain the infinite random shock form as 

00 00 

z t = Aq ^t—j j®t—j = i + + a t (4.3.18) 

7=1 7=1 

So by comparison with (4.3.15), the complementary function can be represented informally 
as 

00 00 

C k (t - k) = 4 0 Z a '-j + h Z j°t-} = C + b V {t - 

j=t—k j=t—k 

By writing the second infinite sum above in the form 
00 00 00 
Yi j a t-j = {t ~ k) Z a >-j + Z ~ {t ~ k ^ a t-j 

j=t-k j=t-k j=t-k 

we see that the coefficients and b^ 1 can be associated with 

b^ = A oS a k + ^iS" a k-\ = Uo _ )Sa k + X\S 2 a k 
bf = hSa k 

Inverted Form of Model. Finally, we consider the model in the inverted form: 

00 

z i = ^j n j z t-j + a,= z f _tO) + a t 
7=1 

Using (4.2.22), we find on equating coefficients in 

1 - 2B + Br = (1 - - 0 2 B 2 )( 1 - n x B - k 2 B 2 -) 

that the n weights of the IMA(0, 2, 2) process are 
7i | =2 — 4 

ji 2 — Q\( 2 — t?j) — (1 + Of) = + 2A} — (4q + A\) 2 (4.3.19) 

(\-0 l B-0 2 B 2 )7i J =Q j> 3 

where B now operates on j. 

If the roots of the characteristic equation 1 — O l B — 6 0 B 2 = 0 are real, the n weights 
are a mixture of two damped exponentials. If the roots are complex, the weights follow a 
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FIGURE 4.9 The n weights for an IMA(0, 2, 2) process with A 0 = 0.5, A { = 0.6. 


damped sine wave. Figure 4.9 shows the weights for a process with 0 l = 0.9 and 0 2 = —0.5, 
that is, A 0 = 0.5 and A , = 0.6. For these parameter values, the characteristic equation has 
complex roots (the discriminant 9^ + 4 0 2 = —1.19 is less than zero). Hence, the weights in 
Figure 4.9 follow a damped sine wave, as expected. 

4.3.3 General Integrated Moving Average Process of Order (0, d, q) 

Difference Equation Form. The general integrated moving average process of order 
(0, d, q) is 

W d z t = (1 - 0,5 - 0 2 B 2 - 0 q B q )a t = 9{B)a t (4.3.20) 

where the zeros of 9(B) must lie outside the unit circle for the process to be invertible. This 
model may be written explicitly in terms of past z’s and a’s in the form 

z t = dz t _i - ^d(d - l)z,_ 2 + + (~l) d+1 z,_ d +a t - 0,a,_,- 9 q a,_ q 

Random Shock Form of Model. To obtain z t in terms of the a,’ s, we write the right-hand 
operator in (4.3.20) in terms of V = 1 — B. In this way, we obtain 

(1 - 0,5- 9 q B q ) = (A d _ q V q ~' + - + AqV"'- 1 + - + A d _ x )B + W d (4.3.21) 

where, as before, the A’s may be written explicitly in terms of the 9’ s, by equating coeffi¬ 
cients of B. 

On substituting (4.3.21) in (4.3.20) and summing d times, informally, we obtain 

z t = (A d _ q W q - d ~' + - + A 0 A + - + A d _ l S d )a t _ l + a, (4.3.22) 

Thus, for q > d. we notice that in addition to the d sums, we pick up q — d additional terms 
V q ~ d ~ l a t _i,... involving a t _ lt a t _ 2 , ...,a t+d _ q . 

If we write this solution in terms of finite sums of a’s entering the system after some 
origin k, we obtain the same form of equation, but with an added complementary function, 
which is the solution of 


S7 d C k (t -k) = 0 
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that is, the polynomial 

C k (t -k) = + b[ k \t - k ) + bf{t - k ) 2 + .» + b ( d k \(t - k) d ~ l 

As before, the complementary function C k (t — k) represents the finite behavior of the 
process, which is predictable at time k. Similarly, the coefficients b^ 1 may be expressed, 
informally, in terms of the infinite sums up to origin k, that is, Sa k , S 2 a k ,..., S d a k . 
Accordingly, we can discover how the coefficients b^ 1 change as the origin is changed, 
from k — 1 to k. 

Inverted Form of Model. Finally, the model can be expressed in the inverted form 

= a t 


or 


z t = vjW + aCt) 

The n weights may be obtained by equating coefficients in (4.2.22), that is, 

(1 - B) d = (1 - 9 X B - 0 2 B 2 - 0 q B q )( 1 - n x B - tt 2 B 2 - •••) (4.3.23) 

This expression implies that for j greater than the larger of d and q, the k weights satisfy 
the homogeneous difference equation 


0(B)jTj = 0 

defined by the moving average operator. Hence, for sufficiently large j. the weights jij 
follow a mixture of damped exponentials and sine waves. 

IMA Process of Order (0, 2, 3). One final special case of sufficient interest to merit 
comment is the IMA process of order (0, 2, 3): 

V 2 z, = (1 -0 X B- 0 2 B 2 - 0 3 B 3 )a t 

Proceeding as before, if we apply the finite double summation operator, this model can be 
written in truncated random shock form as 

t-k -1 t—k—l 

z, = A_ io f _i + A 0 ^ a t _j + A 1 ^ i a t-j + a t + b ( Q } + b^\t - k) 

1=1 1=1 

where the relations between the 2's and 0's are 

0 ^ = 2 — — Aq — A} = —^3 

07 = Aq — 1 + 2A_i Aq = 1 + 07 T 20 3 

0 3 = sA_| Aj = 1 - - 0 2 - 0 3 

Alternatively, it can be written, informally, in the infinite integrated form as 

z, = A - AqSqj_\ + Aj + dj 
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FIGURE 4.10 Invertibility region for parameters A_ 1 ,A 0 , and 4, and of an IMA(0, 2, 3) process. 


Finally, the invertibility region is defined by 


0 1 Y 0 2 + #3 < 1 

— 0 1 + $2 — ft; < 1 

0 3 (0 3 — 0j) — 0 2 < 1 

-1 < 03 < 1 


1 , >0 

2Aq 4 < 4(1 — 4_j) 

4q(1 + A_i) > — 

-1 < /L, < 1 


as is shown in Figure 4.10. 

In Chapter 5, we show how forecasts of future values of a time series can be generated 
in an optimal manner when the model is an ARIMA process. In studying these forecasts, 
we make considerable use of the various model forms discussed in this chapter. 


APPENDIX A4.1 LINEAR DIFFERENCE EQUATIONS 

In this book, we are often concerned with linear difference equations. In particular, the 
ARIMA model relates an output z t to an input a, in terms of the difference equation 

z t ~ <P\ z t-\ ~ c Pl z t-2 ~ - Vp'Zt-p' 

= a t - e iat _ x - e 2 a ,_2 - e q a,_ q (A4.1.1) 

where p' = p + d. 

Alternatively, we may write (A4.1.1) as 

cp{B)z t = 9(B)a t 


where 

cp{B) = 1 - cp^B - cp 2 Br - cp p ,B p ' 

0(B) = 1- 0 l B-9 2 B 2 - 6 q B q 
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We now derive an expression for the general solution of the difference equation (A4.1.1) 
relative to an origin k < t. 


1. We show that the general solution may be written as 

z t = C k (t - k) + I k (t - k) 

where C k (t — k) is the complementary function and I k (t — k) is a “particular inte¬ 
gral.” 

2. We then derive a general expression for the complementary function C k (t — k). 

3. Finally, we derive a general expression for a particular integral I k (t — k). 


General Solution. The argument is identical to that for the solution of linear differential 
or linear algebraic equations. Suppose that z' t is any particular solution of 


( p(B)z t = 9(B)a t 


that is, it satisfies 


= 9(B)a t 

On subtracting (A4.1.3) from (A4.1.2), we obtain 

<p(B)(z t - z.' t ) = 0 


Thus z'j = z t — z' t satisfies 


v(B)z'; = o 


Now 


(A4.1.2) 


(A4.1.3) 


(A4.1.4) 


and hence the general solution of (A4.1.2) is the sum of the complementary function z '', 
which is the general solution of the homogeneous difference equation (A4.1.4), and a 
particular integral z' r which is any particular solution of (A4.1.2). Relative to any origin 
k <t, we denote the complementary function z!' by C k (t — k) and the particular integral 

z' t by I k (t - k). 


Evaluation of the Complementary Function. 


Distinct Roots. Consider the homogeneous difference equation 

<p(B)Zt= 0 (A4.1.5) 

where 

cp{B) = (1 - <?,£)( 1 - G 2 B) - (1 - G p ,B) (A4.1.6) 

and where we assume in the first instance that G l .G 2 , ■■■. G p , are distinct. Then, it is shown 
below that the general solution of (A4.1.5) at time t, when the series is referred to an origin 
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at time k, is 

Zj = A x G*~ k + (A4.1.7) 

where the A ; ’s are constants. Thus, a real root G of <p(B ) = 0 contributes a damped 
exponential term G t ~ k to the complementary function. A pair of complex roots contributes 
a damped sine wave term D'~ k sin(2;r f 0 t + F). 

To see that the expression given in (A4.1.7) does satisfy (A4.1.5), we can substitute 
(A4.1.7) in (A4.1.5) to give 

cp(B)(A x G'~ k + A 2 G'~ k + - + A p ,G'~ k ) = 0 (A4.1.8) 

Now consider 

<p(B)G>- k = (1 - cp x B - cp 2 B 2 - cp p iB p ’)G’ i ~ k 

= G t - k ~ p \G‘ > ; -cp.Gf' - Vpl ) 

We see that (p(B)G'~ k vanishes for each value of i if 

G? ~ <PiG p '~ l -fly = 0 

that is, if B x = 1/G ; is a root of cp(B) = 0. Now, since (A4.1.6) implies that the roots of 
(p{B) = 0 are B t = 1 /G r it follows that cp{B)G , ~ k is zero for all i and hence (A4.1.8) holds, 
confirming that (A4.1.7) is a general solution of (A4.1.5). 

To prove (A4.1.7) directly, consider the special case of the second-order equation: 

(1 -G l B)(l -G 2 B)z, =0 

which we can write as 

(1 — G x B)y t = 0 (A4.1.9) 

where 

y t = (\-G 2 B)z, (A4.1.10) 

Now (A4.1.9) implies that 

y t = G x y t _ x = G~y t _ 2 = ■■■ = G[ k y k 

and hence 


y t = D x G\~ k 
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where Dj = y k is a constant determined by the starting value y k . Hence (A4.1.10) may be 
written as 


z, = G 2 z t _ x + D l G'~ k 

= G 2 (G 2 z,_ 2 + D l G[~ k - 1 )+ D\G‘~ k 


G'~ k z k + D^G'-b + G 2 G'~ k ~ l + 


+ G'-^G^) 


= G’~ k z,. + 


D, 


(G'~ k - G’- K ) 


t—k\ 


1 - G 2 /G\ 


A x G'~ k + A 2 G’~ k 


(A4.1.11) 


where A l , A-, are constants determined by the starting values of the series. By an extension 
of the argument above, it may be shown that the general solution of (A4.1.5), when the 
roots of cp(B) = 0 are distinct, is given by (A4.1.7). 


Equal Roots. Suppose that q>(B) = 0 has d equal roots G Q 1 , so that q>(B) contains a factor 
(1 — G Q B) d . In particular, consider the solution (A4.1.11) for the second-order equation 
when both G t and G 2 are equal to G {) . Then, (A4.1.11) reduces to 

z f = G'~ k z k + D,G\; k (t - k) 


or 


z, = [A 0 + A^t - k)]G' 0 k 

In general, if there are d equal roots G () , it may be verified by direct substitution in 
(A4.1.5) that the general solution is 

z, = [A 0 + A x (t - k ) + A 2 (t - k) 2 + ••• 

+A d _ l (t - k) d ~ l ]G'- k (A4.1.12) 

In particular, when the equal roots G Q are all equal to unity as in the IMA (0, d, q) process, 
the solution is 


z t = Aq + A { (t - k) + A 2 (t - k ) 2 + - + - k) d ~ x (A4.1.13) 

that is, a polynomial in t — k of degree d — 1. 

In general, when cp(B) factors according to 

(1 - G,B)( 1 - G 2 B) - (1 - G p B)( 1 - G 0 B) d 

the complementary function is 

d-l p 

C k (t - k ) = G‘~ k ^ A/t - ky + j D,G'- k (A4.1.14) 

1=0 /=i 

Thus, in general, the complementary function consists of a mixture of damped expo¬ 
nential terms G t ~ k , polynomial terms (t — ky, damped sine wave terms of the form 
D‘~ k s i n(2 f (] t + F), and combinations of these functions. 
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Evaluation of the “Particular Integral”. We now show that a particular integral I k (s — k ), 
satisfying 

<p(B)I k (t — k) = 9(B)a t t — k > q (A4.1.15) 

is a function defined as follows: 

I k (s - k)= 0 s < k 
4(4 = flfc+i 

4( 2 )= a k+2 + V\ a k+\ (A4.1.16) 

licit ~ k ) = a t + Wi a t-\ + V2 a t-2 + ■" + ¥t-k-i a k+\ t > k 

where the i// weights are those appearing in the form (4.2.3) of the model. Thus, the i// 
weights satisfy 

cp(B)y/(B)a t = 9(B)a t (A4.1.17) 

Now the terms on the left-hand side of (A4.1.17) may be set out as follows: 

a, + + y/ 2 a,_ 2 + - + w t -k-\ a k+i +ys,_ k a k + - 

-cp l (a,_ l + ij/ 1 a t _ 2 -I- ••• -I- ii/ t _ k _ 2 a k+l +i//,_ k _ l a k + •••) 

-<Pii - -) (A4.1.18) 

~(Pp' (a t _ p , + ••• + W t -k-pt-i a k +1 +¥ r -k-p' a k + "• 

Since the right-hand side of (A4.1.17) is 

a t ~ e \ a t-\ - e q a t- q 

it follows that the first q + 1 columns in this array sum to a t , , —6 q a t _ q . Now the 

left-hand term in (A4.1.15), where I k (s — k) is given by (A4.1.16), is equal to the sum of 
the terms in the first (t — k ) columns of the array, that is, those to the left of the vertical line. 
Therefore, if t — k < q, that is, the vertical line is drawn after q + 1 columns, the sum of 
all terms up to the vertical line is equal to 9{B)a t . This shows that (A4.1.16) is a particular 
integral of the difference equation. 

Example. Consider the IMA(0, 1, 1) process 

z t - z,_i = a, - 9a t _ x (A4.1.19) 

for which y/j = 1 — 9 for j > 1. Then 

4(0) = o 

4(1) = a k +1 

: (A4.1.20) 
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t-k-l 

I k (t — k) = a, + (1 — 9) ^ a,_j t — k > 1 
1=1 

Now if z t = I k (t — k) is a solution of (A4.1.19), then 

I k (t - k) — I k (t - k - 1) = a, - 9a t _ j 

and as is easily verified, while this is not satisfied by (A4.1.20) for t — k = 1, it is satisfied 
by (A4.1.20) for t — k > 1, that is, for t — k > q. 

APPENDIX A4.2 IMA(0,1,1) PROCESS WITH DETERMINISTIC DRIFT 

The general model </>( B)S7 d z t = 0 0 + 6(B)a t can also be written as 

qfi(B)S7 d z t = 9(B)e, 

with the shocks e t having a nonzero mean S, = /(1 — 0 l — ■■■ — 0 q ). For example, the 

IMA(0, 1,1) model is then 


Vz, = (1 - 9B)e t 

with E[e t ] = £, = #o/(l — 9). In this form, z, could represent, for example, the outlet 
temperature from a reactor when heat was being supplied from a heating element at a fixed 
rate. Now if 


£ t =Z + a, (A4.2.1) 

where a, is white noise with zero mean, then with reference to a time origin k , the integrated 
form of the model is 


t-k- 1 

=b 0 ) + A Z £ t-j+ £ t (A4.2.2) 

1=1 

with 2=1 —9. Substituting for (A4.2.1) in (A4.2.2), the model written in terms of the o’s 
is 

t-k-l 

z, = b ( Q k) + A£(t -k-l) + 4 + A 2 a t-j + a, (A4.2.3) 

1=1 

Thus, we see that z t contains a deterministic slope or drift due to the term Ac(t — k — 1), 
with the slope of the deterministic linear trend equal to AS, = 9 Q . Moreover, if we denote 
the “level” of the process at time t — 1 by / ( _ 1( where 

z t = U-\ + a t 

we see that the level is changed from time t — 1 to time t, according to 


lj — / f _i + AS, + Adj 
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The change in the level, thus, contains a deterministic component AS, = 9 0 , as well as a 
stochastic component Aa t . 


APPENDIX A4.3 ARIMA PROCESSES WITH ADDED NOISE 

In this appendix, we consider the effect of added noise (e.g., measurement error) to a 
general ARIMA(p, d, q) process. The results are also relevant to determine the nature of 
the reduced form ARIMA model of an observed process in structural component models 
(see Section 9.4), in which an observed series Z t is presumed to be represented as the sum 
of two unobservable component processes that follow specified ARIMA models. 


A4.3.1 Sum of Two Independent Moving Average Processes 

As a necessary preliminary to what follows, consider a stochastic process w t , which is the 
sum of two independent moving average processes of orders q x and q 2 , respectively. That 
is, 


w t = w Xt + w 2t = 9 x (B)a t + 0 2 (B)b, (A4.3.1) 

where 9 X (B) and 9^B) are polynomials in B, of orders q l and q 2 , and the white noise 
processes a, and b, have zero means, variances a 2 and a 2 , and are mutually independent. 
Suppose that q = max(q x , q 2 ); then since 


Yj(w) = Yj{w x ) + Yj (w 2 ) 


it is clear that the autocovariance function /j(w) for w t must be zero for j > q. It follows 
that there exists a representation of w t as a single MA(g) process: 

w, = 9(B)u t (A4.3.2) 

where u t is a white noise process with mean zero and variance a 2 . Thus, the sum of two 
independent moving average processes is another moving average process, whose order is 
the same as that of the component process of higher order. 

The parameters in the MA(g) model can be deduced by equating the autocovariances 
of w t , as determined from the representation in (A4.3.1), with the autocovariances of the 
basic MA(g) model (A4.3.2), as given in Section 3.3.2. For an example, suppose that 
w Xt = 9 x (B)a t = (1 — j B)a t is MA(1) and w 7t = 9~,(B)b t = (1 — 9 X — 9 2 ~,B 2 )b t is 
MA(2), so that w, = 9{B)u t is MA(2) with 

w t = ( 1 - 9 l x B)a t + (1 — 9 X 2 B — 9 22 B^)b, 

= (1 - 9 X B - 9 2 B 2 )u, 


The parameters of the MA(2) model for w t can be determined by considering 

Y 0 (w) = (1 + 9\y a + (1 + 0 2 2 + 9 2 _ 2 )o 2 b = (1 + 9\ + 0 2 2 )o 2 u 
Y X (w) = —9 l x i 7“ + (-0i 2 + 9 X 2 9 22 )a b = (-9 X + 9 X 9 2 )o~ 
Y 2 (w) = - 0 2,2 a b = ~ e 2°l 



ARIMA PROCESSES WITH ADDED NOISE 123 


and solving for 9 \, 0 2 , and in terms of given values for the autocovariances /q (u>), jq (w), 

y 2 (w) as determined from the left-hand-side expressions for these. 


A4.3.2 Effect of Added Noise on the General Model 

Correlated Noise. Consider the general nonstationary model for the process z, of order 
(P, d, q): 


(f>(B)V d z, = 0(B)a t (A4.3.3) 

Suppose that we cannot observe z t itself, but only Z t = z t + b t , where b t represents some 
extraneous noise (e.g., measurement error) or simply some additional unobserved compo¬ 
nent that together with z t forms the observed process Z t , and b, may be autocorrelated. We 
wish to determine the nature of the model for the observed process Z t . In general, applying 
if>(B)S7 d to both sides of Z t = z t + b t . we have 

<j>(B)S7 d Z, = 0(B)a t + (f>(B)V d b t 

If the noise b t follows a stationary ARMA process of order (pj, 0, q{), 

(j>i(B)b t = 0 l (B)a, 

where a, is a white noise process independent of the a, process, then 

0 1 (B)0(B)V rf Z, = MBMB) a, + <KB)6i(B)^ d a, 

V V V V V V 

-v ■ V V- 

Pl+p+d Pi+q p+li+d 

where the values below the braces indicate the degrees of the various polynomials in B. 
Now the right-hand side of (A4.3.5) is of the form (A4.3.1). Let P = p l + p and Q be equal 
to whichever of (p 1 + q) and (p + + d) is larger. Then we can write 

< f> 2 (B)V d Z t = 6 2 {B)u, 

with u t a white noise process, and the Z, process is seen to be an ARIMA of order 
(P, d, Q ). The stationary AR operator in the ARIMA model for Z t is determined as 4> 2 (B ) = 
4>\iB)<p(B), and the parameters of the MA operator 9 2 (B) and c- are determined in the same 
manner as described in Section A4.3.1, that is, by equating the nonzero autocovariances 
from the representations: 

< j) l {B)e{B)a l + cl)(B)e l (B)V d a t = 0 2 {B)u t 

Added White Noise. If, as might be true in some applications, the added noise is white, 
then </>[(£) = 6\B = 1 in (A4.3.4), and we obtain 

c/)(B)V d Z t = 0 2 (B)u, (A4.3.6) 


(A4.3.4) 

(A4.3.5) 


with 


9 2 (B)a l = 9(B)a t + (f>(B)V d b t 
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which is of order ( p , d, Q ) where Q is the larger of q and (p + d). If p + d < q, the order 
of the process with error is the same as that of the original process. The only effect of the 
added white noise is to change the values of the O 's (but not the 0’s). 

Effect of Added White Noise on an Integrated Moving Average Process. In particular, an 
IMA process of order (0, d , q), with white noise added, remains an IMA of order (0, d, q) if 
d < q\ otherwise, it becomes an IMA of order (0, d, d ). In either case, the parameters of the 
process are changed by the addition of noise, with the representation \ id Z t = 0 2 (B)u t as in 
(A4.3.6). The nature of these changes can be determined by equating the autocovariances 
of the i/th differences of the process, with added noise, to those of the fifth differences 
of a simple IMA process, that is, as a special case of the above, by equating the nonzero 
autocovariances in the representation 

0(B)a, + W d b, = 0 2 (B)u, 

The procedure will now be illustrated with an example. 


A4.3.3 Example for an IMA(0,1,1) Process with Added White Noise 

Consider the properties of the process Z, = z t + b t when 

z, = z,_i - (1 - /i)fi(,_i + a t (A4.3.7) 

and the b t and a, are mutually independent white noise processes. The Z, process has first 
difference W t = Z t — Z ( _, given by 

W t = [\-{\-X)B]a t + {\-B)b t (A4.3.8) 

The autocovariances for the first differences W t are 

n) =o 2 a [\+a-X) 2 ] + 2c 2 b 

Y\ = -0^(1 - X)-o 2 b (A4.3.9) 

Yj = 0 j >2 

The fact that the yj are zero beyond the first lag confirms that the process with added noise 
is, as expected, an IMA process of order (0, 1, 1). To obtain explicitly the parameters of 
the IMA that represents the noisy process, we suppose that it can be written as 

Z, = Z,_, - (1 - A)«,_! + u t (A4.3.10) 

where u t is a white noise process. The process (A4.3.10) has first differences W, = Z, — 
Z t _ | with autocovariances 


7o = [ 1 + (1 — A) 2 ] 

Y\ = -<7„ 2 (1 - A) 

Yj = 0 j >2 


(A4.3.11) 
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Equating (A4.3.9) and (A4.3.11), we can solve for A and a 2 explicitly. Thus 

A 2 A 2 

1 — A 2 1 — A + g 2 /g 2 

b' a 


2 _ 2 A 2 
~~ (Ta \2 


(A4.3.12) 


Suppose, for example, that the original series has A = 0.5 and g 2 = g 2 ; then, A = 0.333 
and g 2 = 2.25(7 2 . 


A4.3.4 Relation between the IMA(0,1,1) Process and a Random Walk 

The process 


z t ~ z t -1 + a t 


(A4.3.13) 


which is an IMA(0, 1, 1) process, with A = 1(0 = 0), is called a random walk. If the a t are 
steps taken forward or backward at time f, then z t will represent the position of the walker 
at time t. 

Any IMA(0, 1,1) process can be thought of as a random walk buried in white noise 
b t , uncorrelated with the shocks a, associated with the random walk process. If the noisy 
process is Z t = z t + b t , where z, is defined by (A4.3.13), then using (A4.3.12), we have 


with 


Z, = -(1 - A)u f _! +u t 


A 2 = 
!-A 2 a\ 


a 2 = 


A 2 


(A4.3.14) 


A4.3.5 Autocovariance Function of the General Model with Added Correlated 
Noise 

Suppose that the basic process is an ARIMA process of order ( p , d, q): 

(t>(B)V d z, = 9(B)a, 

and that Z t = z, + b t is observed, where the stationary process b t , which has autocovariance 
function Yj(b), is independent of the process a r and hence of z t . Suppose that Yj(w) is the 
autocovariance function for w t = S/ d z t = 4>~ 1 (B)9(B)a t and that W t = \ 7d Z r We require 
the autocovariance function for W t . Now 

V d (Z t -b t ) = <l ) - l (B)9(B)a t 
W t = w t + v, 

where 

v t = S/ d b t = (1 - B) d b t 
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Hence 


Yj(W) = Yj(w) + 7j(v) 

Yj(v) = (1 - B) d ( 1 - F) d Yj (b) 
= (—1/(1 -B) 2d Y j+d (b) 


and 


Yj(W) = Yj(w) + (-1/(1 - B) 2d Y J+d (b) (A4.3.15) 

For example, suppose that correlated noise b t is added to an IMA(0, 1, 1) process 
defined by w t = Vz f = (1 — 6B)a t . Then the autocovariances of the first difference W t of 
the “noisy” process will be 

Y 0 (W) = <#1 + 0 2 ) + 2 [Yo(b) ~ Yi(b)] 

Yi(W) = -c 2 a 0 + [2 Yi(b) - Y 0 (b) ~ Y 2 (b)] 

Yj(W) = [2 Y j(b) - Yj-i(b) - Yj+im J > 2 

In particular, if b, was first-order autoregressive, so that b t = <fib t _\ + a t , 

Yo(W) = ct 2 (1 + 9 2 ) + 2y 0 (6)(l - 0) 

Y l (W) = -c 2 a 6-Y 0 mi-<l>) 2 
Yj(W) = -Y 0 m J -'(\ - 0) 2 j > 2 

where Yo(b) = c^/(l — <t> 2 ). In fact, from (A4.3.5), the resulting noisy process Z t = z t + b t 
is in this case defined by 

(1 - <pB)VZ t = (1 - 4>B)( 1 - 6B)a t + (1 - B)a t 

which will be of order (1,1,2), and for the associated ARMA(1,2) process W t = V Z ( , we 
know that the autocovariances satisfy y/lF) = 4>Yj~\(W) f° r j > 3 [e.g., see (3.4.3)] as is 
shown explicitly above. 


EXERCISES 

4.1. For each of the models 

(1) (1 -B)z, = (1-0.5 B)a, 

(2) (1 - B)z, = (1 - 0.2 B)a t 

(3) (1 -0.5R)(1 - B)z, = a, 

(4) (1 -0.2R)(1 - B)z t = a, 

(5) (1 - 0.2R)(1 - B)z t = (1 - 0.5 B)a, 

(a) Obtain the first seven i //j weights. 

(b) Obtain the first seven rtj weights. 

(c) Classify as a member of the class of ARIMA(/a d, q) processes. 
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4.2. For the five models of Exercise 4.1, and using where appropriate the results there 
obtained, 

(a) Write each model in random shock form. 

(b) Write each model as a complementary function plus a particular integral in 
relation to an origin k = t — 3. 

(c) Write each model in inverted form. 

4.3. Consider the IMA(0, 2, 2) process with parameters 0 l = 0.8 and 9 2 = —0.4. 

(a) Is the process invertible? If so, what is the expected pattern of the n weights? 

(b) Calculate and plot the first ten it weights for the original series z, and comment. 

(c) Calculate and plot the first ten it weights for the differenced series w t = 
(1 ~B) 2 z t . 

4.4 Given the following series of random shocks a t , and given that z 0 = 20, z_ t = 19, 


t 

a , 

t 

a, 

t 

a, 

0 

-0.3 

5 

-0.6 

10 

-0.4 

1 

0.6 

6 

1.7 

11 

0.9 

2 

0.9 

7 

-0.9 

12 

0.0 

3 

0.2 

8 

-1.3 

13 

-1.4 

4 

0.1 

9 

-0.6 

14 

-0.6 


(a) Use the difference equation form of the model to obtain zj, z 2 ,..., z i4 for each 
of the five models in Exercise 4.1. 

(b) Plot the resulting series. 

4.5. Using the inverted forms of each of the models in Exercise 4.1, obtain z 12 , z 13 , and 
z 14 , using only the values z 1 , z 2 ,..., z n derived in Exercise 4.4 and a 12 , a 13 , and 
a 14 . Confirm that the values agree with those obtained in Exercise 4.4. 

4.6. Consider the IMA(0, 1, 1) model (1 — B)z t = (1 — 9)a t , where the a t are i.i.d. 
N(0, a 2 a ). 

(a) Derive the expected value and variance of z t , t = 1,2,..., assuming that the 
process starts at time t = 1 with z 0 = 10. 

(b) Derive the correlation coefficient p k between z, and z t _ k , conditioning on z 0 = 
10. Assume that t is much larger than the lag k. 

(c) Provide an approximate value for the autocorrelation coefficient p k derived in 
part (c). 

4.7. If z t = iijZ t+ i_j, then for models (1) and (2) of Exercise 4.1, which are of 
the form (1 — B)z t = (1 — 9B)a t ,z t is an exponentially weighted moving average. 
For these two models, by actual calculation, confirm that z n ,z 12 , and z 13 satisfy 
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the relations 


z, = z r _i + a t (see Exercise 4.5) 
z, = z t _ x + (1 -0)a t 
= (1 - 9)z t + 9z t _x 


4.8. If w it = (1 — 9xB)a lt and w 2t = (1 — 9 2 B)a 2t , show that w 2t = w lt + w 2t may be 
written as w 2t = (1 — 9 2 B)a 2t , and derive an expression for 0 2 and in terms of 
the parameters of the other two processes. State your assumptions. 

4.9. Suppose that Z, = z, + b t , where z t is a first-order autoregressive process (1 — 
<pB)z t = a, and b t is a white noise process with variance <7~. What model does 
the process Z t follow? State your assumptions. 

4.10. (a) Simulate a time series of N = 200 observations from an IMA(0, 2, 2) model with 

parameters 9 { = 0.8 and 0 2 = —0.4 using the arima.sim() function in R; type 
help(arima.sim) for details. Plot the resulting series and comment on its behavior. 

(b) Estimate and plot the autocorrelation function of the simulated time series. 

(c) Estimate and plot the autocorrelation functions of the first and second differences 
of the series. 

(d) Comment on the patterns of the autocorrelation functions generated above. Are 
the results consistent with what you would expect to see for this IMA(0, 2, 2) 
process? 

4.11. Download the daily S& P 500 Index stock price values for the period January 2, 2014 
to present from the Internet (e.g., http://research.stlouisfed.org). 

(a) Plot the series using R. Calculate and graph the autocorrelation and partial 
autocorrelation functions for this series. Does the series appear to be stationary? 

(b) Repeat the calculations in part (a) for the first and second differences of the 
series. Describe the effects of differencing in this case. Can you suggest a model 
that might be appropriate for this series? 

(c) The return or relative gain on a stock can be calculated as (z t — z t _{)/z t or 
log(z r ) — log(z r _]). Perform this calculation and comment on the stationarity of 
the resulting series. 

4.12. Repeat the analysis in Exercise 11 for the Dow Jones Industrial Average, or for a 
time series of your own choosing. 


5 _ 

FORECASTING 


In Chapter 4, we discussed the properties of autoregressive integrated moving average 
(ARIMA) models and examined in detail some special cases that appear to be common in 
practice. We will now show how these models may be used to forecast future values of 
an observed time series. In Part Two, we will consider the problem of selecting a suitable 
model of this form and fitting it to actual data. For the present, however, we proceed as 
if the model were known exactly, bearing in mind that estimation errors in the parameters 
will not seriously affect the forecasts unless the time series is relatively short. 

This chapter will focus on nonseasonal time series. The forecasting, as well as model 
fitting, of seasonal time series is described in Chapter 9. We show how minimum mean 
square error (MSE) forecasts may be generated directly from the difference equation form 
of the model. A further recursive calculation yields probability limits for the forecasts. It is 
emphasized that for practical computation of the forecasts, this approach via the difference 
equation is the simplest and most elegant. However, to provide insight into the nature of 
the forecasts, we also consider them from other viewpoints. As a computational tool, we 
also demonstrate how to generate forecasts and associated probability limits using the R 
software. 


5.1 MINIMUM MEAN SQUARE ERROR FORECASTS AND THEIR 
PROPERTIES 

In Section 4.2, we discussed three explicit forms for the general ARIMA model: 


cp(B)z t = 9(B)a t 


(5.1.1) 


Time Series Analysis: Forecasting and Control, Fifth Edition. George E. P. Box, Gwilym M. Jenkins, 
Gregory C. Reinsel, and Greta M. Ljung 
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where <p( B ) = (/>( B)\ /d . We begin by recalling these three forms since each one sheds light 
on a different aspect of the forecasting problem. 

We will consider forecasting a value z t+l . / > 1, when we are currently at time t. This 
forecast is said to be made at origin t for lead time 1. We now summarize the results of 
Section 4.2, but writing t + / for t and t for k. 

Three Explicit Forms for the Model. An observation z t+l generated by the ARIMA process 
may be expressed as follows: 

1. Directly in terms of the difference equation by 


z t+l - <P\ z t+l-\ + + c Pp+d z t+l-p-d - ^l a r+/-l - "• 

- 9q a t+l-q + a r+l 

2. As an infinite weighted sum of current and previous shocks a ; -: 


z ,+i = X Vj a t+i-j 
j =o 


(5.1.2) 


(5.1.3) 


where yz 0 = 1 and, as in (4.2.5), the i// weights may be obtained by equating coeffi¬ 
cients in 


<p(B)( 1 + if/j B + y/ 2 B 2 + •••) = 6(B) (5.1.4) 

Equivalently, for positive /, with reference to origin k < t. the model may be written 
in the truncated form: 


z t+l = a t+l + Wi a t+l-l + -b ¥t-l a t+\ 

+ ¥i a t + ■" + l l / i+i-k-i a k+i + Q .(t + 1 — k) 

= a t+l + \!/ x a t+l _ x + ■■■ + i//i_ l a t+l + C t (l) (5.1.5) 


where C k (t + 1 — k) is the complementary function relative to the finite origin k of 
the process. From (4.2.19), we recall that the complementary function relative to the 
forecast origin t can be expressed as C,(l) = C k (t + 1 — k) + i gia, + i^/ + i« r _i + ••• + 
Wt+i-k-\ a k +i- Informally, C,(l) is associated with the truncated infinite sum: 


Q(/) = 5>^-; (5.1.6) 

j=i 

3. As an infinite weighted sum of previous observations, plus a random shock, 

00 

z t+i ~ 71 j z t+l-j ~b a t+i (5.1.7) 

j =i 

Also, if d > 1 , 

00 

z t+l- 1(^) = ^ K j z t+i-j 
7=1 


(5.1.8) 
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will be a weighted average, since then Yi'JLi ^ = 1- As in (4.2.22), the n weights 
may be obtained from 

cp(B) = (1 - rt x B - n 2 B 2 - - )6(B ) (5.1.9) 


5.1.1 Derivation of the Minimum Mean Square Error Forecasts 

Now suppose, at origin f, that we are to make a forecast z r (/) of z t+/ , which is to be a linear 
function of current and previous observations z r , z t _ 1 , z r _ 2 ,.... Then, it will also be a linear 
function of current and previous shocks a t , a t _ t , a t _ 2 , ■■■■ 

Suppose, then, that the best forecast is 

z t (0 = V* a, + W* +l a t -i + V*+2 a t-2 + 

where the weights i//*, t//* ( ,... are to be determined. Then, using (5.1.3), the mean square 
error of the forecast is 

E[z t+ , - zfl)] 2 =(1 + if/ 2 + 

oo 

+ 5>, + ; - (5.1.10) 

l=o 

which is minimized by setting i //* = Wi+j- This conclusion is a special case of more 

general results in prediction theory (Wold, 1938; Kolmogoroff (1939, 1941a, 1941b), 
Wiener, 1949; Whittle, 1963). We then have 


z t+l ~ ( a t+l + ¥i a t+l-l + •" + Vi-i a t+\) 

+ Wi a t + Wi+\ a t-\ + ■") (5.1.11) 

= e t (l) + z,(l) (5.1.12) 

where e t (l) is the error of the forecast z t (l) at lead time /. 

Certain important facts emerge. As before, denote E[z t+t \z t , z r _|,...], the conditional 
expectation of z r+/ given the knowledge of all the z’s up to time t, by E t [z t+/ ]. We will 
assume that a, are a sequence of independent random variables. 

1. Then, E[a t+ j\z t , z ,_|,...] = 0, j >0, and so from (5.1.3), 

z,(l) = y/,a t + Vi+\a,-\ + — = E t [z t+/ ] (5.1.13) 

Thus, the minimum mean square error forecast at origin t, for lead time /, is the 
conditional expectation of z f+/ at time t. When z t (I ) is regarded as a function of / 
for fixed f, it will be called the, forecast function for origin t. We note that a minimal 
requirement on the random shocks o, in the model (5.1.1) in order for the conditional 
expectation E t [z t+l ], which always equals the minimum mean square error forecast, 
to coincide with the minimum mean square error linear forecast is that E t [a t+ j\ = 0, 
j > 0. This property may not hold for certain types of nonlinear processes studied, 
for example, by Priestley (1988), Tong (1983, 1990), and many subsequent authors. 
Such processes may, in fact, possess a linear representation as in (5.1.1), but the 
shocks a t will not be independent, only uncorrelated, and the best forecast E,[z r+/ ] 
may not coincide with the best linear forecast z r (/) as obtained in (5.1.11). 
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2. The forecast error for lead time / is 


e,(l) — a t+ i + V / l a r+/-l + + Vl-l a t+l (5.1.14) 

Since 

E,[e t (l)] = 0 (5.1.15) 

the forecast is unbiased. Also, the variance of the forecast error is 

V(l) = var[e f (/)] = (1 + y/f + y/f + (5.1.16) 

3. It is readily shown that not only is z t (l) the minimum mean square error forecast 

of z t+ f, but that any linear function u>iZ t (l ) of the forecasts is also a minimum 

mean square error forecast of the corresponding linear function \ w i z t+i of the 
future observations. For example, suppose that using (5.1.13), we have obtained, from 
monthly data, minimum mean square error forecasts z t ( 1), z,( 2), and z,( 3) of the sales 
of a product 1, 2, and 3 months ahead. Then, it is true that z r ( 1) + z r (2) + z,(3) is the 
minimum mean square error forecast of the sales z f+1 + z l+2 + z f+3 during the next 
quarter. 

4. The Shocks as One-Step-Ahead Forecast Errors. Using (5.1.14), the one-step-ahead 
forecast error is 


e t (l) = z t+1 -z t (l) = a t+l (5.1.17) 

Hence, the shocks a r which generate the process, and which have been introduced 
so far merely as a set of independent random variables or shocks, turn out to be the 
one-step-ahead forecast errors. 

It follows that for a minimum mean square error forecast, the one-step-ahead 
forecast errors must be uncorrelated. This makes sense, for if the one-step-ahead errors 
were correlated, the forecast error a t+] could, to some extent, be predicted from 
available forecast errors a t , a t _ [, a t _ 2 , .... If the prediction so obtained was d r+] , then 
z f (l) + d r+1 would be a better forecast of z r+1 than was z r (l). 

5. Correlation between the Forecast Errors. Although the optimal forecast errors at lead 
time 1 will be uncorrelated, the forecast errors for longer lead times in general will 
be correlated. In Section A5.1.1, we derive a general expression for the correlation 
between the forecast errors e t (l) and e t _j(l), made at the same lead time / from 
different origins t and t — j. 

Now, it is also true that forecast errors e,(l) and e,(l + j), made at different lead 
times from the same origin f, are correlated. One consequence of this is that there will 
often be a tendency for the forecast function to lie either wholly above or below the 
values of the series, when they eventually come to hand. In Section A5.1.2, we give a 
general expression for the correlation between the forecast errors e t (l) and e t (l + j), 
made from the same origin. 

5.1.2 Three Basic Forms for the Forecast 

We have seen that the minimum mean square error forecast z r (/) for lead time / is the 
conditional expectation E t [z t+ f], of z t+l , at origin t. Using this fact, we can write expressions 
for the forecast in any one of three different ways, corresponding to the three ways of 
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expressing the model summarized earlier in this section. To simplify the notation, we 
will temporarily adopt the convention that square brackets imply that the conditional 
expectation, at time t, is to be taken. Thus, 


[a,+/] — E t [a t+ /] [-£?+/] — E A z t+i\ 

For / > 0, the following are three different ways of expressing the forecasts: 

1. Forecasts from Difference Equation. Taking conditional expectations at time t in 
(5.1.2), we obtain 


t z r+;l - m - (p\[z t+l _]\ + —I- (p p+d [z t+ i_ p _ d ] - [o f+ /_i] 

- e q [a t+l _ q ] + [a t+l ] (5.1.18) 

2. Forecasts in Integrated Form. Use of (5.1.3) gives 


[z t+ i\ — z,(l) — I a t+l \ + ii/i[a t+l _ x \ + + y/i_i[a t+l \ 

+ ¥i[a t ] +vW«r-i] + ■" (5.1.19) 

yielding the form (5.1.13) discussed above. Alternatively, using the truncated form 
of the model (5.1.5), we have 


[z t+ ,\ = z,(l) = [a t+l ] + + ••• 

+ Vt+i-k-Mk+A + c k( { + l ~ k) 

= [a r+ /] + Yq[a r+ /_i] + + igi-\[a t+l ] + C,(I) (5.1.20) 

where C t (I) is the complementary function at origin t. 

3. Forecasts as a Weighted Average of Previous Observations and Forecasts Made at 
Previous Lead Times from the Same Origin. Finally, taking conditional expectations 
in (5.1.7) yields 


[Z t+I ] = m = Yj n A z ‘+l-A + [a t+ ,\ (5.1.21) 

j =i 

It is to be noted that the minimum mean square error forecast is defined in terms of 
the conditional expectation 

lz,+i\ = E ,[z,+i] = E[z t+I \z t ,z t _ x ,...] 

which theoretically requires knowledge of the z’s stretching back into the infinite 
past. However, the requirement of invertibility imposed on the ARIMA model ensures 
that the n weights in (5.1.21) form a convergent series. Hence, for the computation of 
a forecast, the dependence on z t _j for j > k can typically be ignored. In practice, the 
n weights usually decay rather quickly, so whatever form of the model is employed, 
only a moderate length of series z t , z t _ j,..., z t _ k is needed to calculate the forecasts 
to sufficient accuracy. The methods we discuss are easily modified to calculate the 
exact finite sample forecasts, E[z t+t \z t , z t _ lt ..., zj, based on the finite length of 
data z ( , z . t _ 1 ,..., z l . 
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To calculate the conditional expectations in expressions (5.1.18-5.1.21), we note that if j 
is a nonnegative integer, 

[Zt-j] = E tlz t -j 1 = z t -j j = 0 , 1 , 2 ,... 

[z t+j ] = E,[z t+j \ = z t U ) j= 1,2,... 

[a t _j] = E t [a,_j] = a,_j = z,_j - z t _j_ x {\) j = 0,1,2,... 

[a t+j ] = E,[a t+j ] = 0 7 = 1 , 2 ,... 

Therefore, to obtain the forecast z t (l), one writes down the model for z. t+l in any one of 
the three explicit forms above and treats the terms on the right according to the following 
rules: 

1. The z t _j(j = 0,1,2,...), which have already occurred at origin t, are left unchanged. 

2. The z t+ j(j = 1,2,...), which have not yet occurred, are replaced by their forecasts 
z,(j) at origin t. 

3. The = 0,1,2,...), which have occurred, are available from z t _j — z t _j_ l ( 1). 

4. The a t+ j(j = 1,2,...), which have not yet occurred, are replaced by zeros. 

For routine calculation, it is easiest to work directly with the difference equation form 
(5.1.18). Hence, the forecasts for /= 1,2,... are calculated recursively as 

p+d q 

z,0) = X vM l - j) - Yd e j a >+i-j 

j= 1 J=l 

where z,(—j ) = | z t _j] denotes the observed value z t _j for j > 0 , and the moving average 
terms are not present for lead times / > q. 

Example: Forecasting Using the Difference Equation Form. We will show in Chapter 7 
that the viscosity data in Series C can be represented by the model 


(1 - 0.85X1 - B)z t+l = a t+l 


that is, 

(1 - 1.85 + 0.8iJ 2 )z r+ ] = a t+l 
or 

z r+ i = 1 . 8 z r+; _! - 0 . 8 z r+/ _2 + a t+i 
The forecasts at origin t are given by 

z f ( 1 ) = 1 . 8 z t - 0 . 8 zj_j 

z r (2)= 1.8z f (l)-0.8z, (5.1.23) 

z,(/)= 1.8z ( (/ — 1) — 0.8z f (/ - 2) / = 3,4,... 

yielding in a simple recursive calculation. 

There are no moving average terms in this model. However, such terms produce no 
added difficulties. Later in this chapter, we have a series arising in a control problem, for 
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which the model at time t + I is 


V 2 z r+; = (1 - 0.9 B + 0.5 B 2 )a t+ i 


or, equivalently, z t+t = 2z r+/ _j - z t+t _ 2 + a t+l - 0.9a r+; _[ + 0.5a r+; _ 2 . Then, 


z t (\) = 2z, — z,_ x - 0.9 a, + 0.5a t _ l 

z,(2) = 2z r (l) — z t + 0.5 a t 

z,(l) = 2z,(l - 1) - z t (l -2) 1 = 3,4,... 

In these expressions, we remember that a t = z, — z r _ 1 (l), a t _ l = z,_ ] — z t _ 2 ( 1), and the 
forecasting process may be started off initially by setting unknown a values equal to their 
unconditional expected values of zero. Thus, assuming by convention that data are available 
starting from time s = 1, the necessary a/s are computed recursively from the difference 
equation form (5.1.2) of the model: 


( p+d q \ 

( Pj z s-j - X 0 j a s-j ) s = p + d + l,... ,t 
j= 1 7 = 1 / 

setting initial a s ’s equal to zero, for s < p + d + 1. Alternatively, it is possible to estimate 
the necessary initial a/ s, as well as the initial z/s, using back-forecasting. This technique, 
which essentially determines the conditional expectations of the presample a s ’s and z/s, 
given the available data, is discussed in Chapter 7 with regard to parameter estimation of 
ARIMA models. However, provided that a sufficient length of data series z t , z r _ h ..., z 1 
is available, the two different treatments of the initial values will have a negligible effect 
on the forecasts Z[(/). 


5.2 CALCULATING FORECASTS AND PROBABILITY LIMITS 
5.2.1 Calculation of y/ Weights 

It is often the case that forecasts are needed for several lead times 1,2,... ,L. As already 
shown, the difference equation form of the model allows the forecasts to be generated 
recursively in the order z r (l), z r (2), z f (3), and so on. To obtain probability limits for these 
forecasts, it is necessary to calculate the weights \p ] , \p 2 , ... ,yr L _\- This is accomplished 
using the relation 


tp(B)y/B = 0(B) (5.2.1) 

that is, by equating coefficients of powers of B in 

(1 ~V\B - <p p+d B p+d ) (1 + \g x B + i p 2 B 2 + •••) 

= (1 -6 { B-0 2 B 2 - 6 q B q ) 


(5.2.2) 
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Knowing the values of the (p's and the 0’s, the values of y/ may be obtained as follows: 


¥\ = <P i~0\ 

¥2 = V\¥i + V 2 -O 2 

¥j = <PjVj -1 + - + (Pp+dVj-p-d - Oj 


(5.2.3) 


where i// 0 = 1, 1 // ■ = 0 for j < 0, and 0j = 0 for j > q. If K is the greater of the integers 
p + d — 1 and q , then for j > K the i//’s satisfy the difference equation: 


Vj = <Pi¥j-i + <P2Vj-2 + •" + <P p+ d¥j- p -d (5.2.4) 

Thus, the i//’s are easily calculated recursively. For example, for the model (1 — 1.8 B + 
0.8 B 2 )z, = c/ r , appropriate to Series C, we have 

(1 - 1.85 + 0.8iJ 2 )(l + y/ x B + y/ 2 B 2 + •••) = 1 

Hence, with cp j = 1.8 and cp 2 = —0.8, we obtain 

¥q = 1 
y/ x = 1.8 

1 Vj = 1.8yry_i - 0.8 y/j_ 2 j = 2,3,4,... 

so that y/ 2 = (1.8 X 1.8) - (0.8 X 1.0) = 2.44 and y/ 3 = (1.8 X 2.44) - (0.8 X 1.8) = 2.95, 
and so on. 

Before proceeding to discuss the probability limits, we briefly mention the use of the yr 
weights for updating of forecasts as new data become available, 

5.2.2 Use of the yr Weights in Updating the Forecasts 

Using (5.1.13), we can express the forecasts z r+1 (/) and z t (l + 1) of the future observation 
Zf+i+i made at origins t + 1 and t as 


z f+1 (/) = ¥ia t+x + ¥i+i a i + ¥ 1 +2 a t-\ + 

z,(l + 1) = y/ l+l a t + y/ i+2 a t _ x + ••• 


On subtraction, it follows that 


W0 = ZfU + 1) + ¥i a t+i (5.2.5) 

Explicitly, the /-origin forecast of z r+;+1 can be updated to become the t + 1 origin forecast 
of the same z r+/+] , by adding a constant multiple of the one-step-ahead forecast error 
a t+ i = z l+l — z,(l) with multiplier y/j. 

This leads to a rather remarkable conclusion. Suppose that we currently have forecasts 
at origin t for lead times 1,2,... ,L. Then, as soon as z r+/ becomes available, we can 
calculate a t+l = z t+ i — z t ( 1 ) and proportionally update to obtain forecasts z f+1 (/) = z,(l + 
1 ) + y>td J+ ! at origin t + 1 , for lead times 1,2,... ,L — 1 . The new forecast z. t+l (L), for 
lead time L, cannot be calculated by this means but is easily obtained from the forecasts at 
shorter lead times, using the difference equation. 
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TABLE 5.1 Variance Function for Series C 

I 123456789 10 

V(l)/a 2 a 1.00 4.24 10.19 18.96 30.24 43.86 59.46 76.79 95.52 115.41 


5.2.3 Calculation of the Probability Limits at Different Lead Times 

The expression (5.1.16) shows that the variance of the /-steps-ahead forecast error for any 
origin t is the expected value of 

«?(0 = lz,+i ~ z t (l)] 2 

and is given by 

f(/) = ^i + 5> 2 ^ 2 

For example, using the i// weights calculated above, the function V(l)/a 2 for Series C is 
shown in Table 5.1. 

Assuming that the a’s are normal, it follows that given information up to time t, the 
conditional probability distribution p(z t+ /\z t , z r _j ,...) of a future value z t+/ of the process 
will be normal with mean z t (l ) and standard deviation 



Thus, the variate (z f+/ — z t (l))/<j(l) will have a unit normal distribution and so z t (l) ± 
u e / 2 <r(l) provides limits of an interval such that z t+! will lie within the interval with 
probability 1 — e, where u f / 2 is the deviate exceeded by a proportion e/2 of the unit normal 
distribution. Figure 5.1 shows the conditional probability distributions of future values 
z 2 1 , z 29 , z 2 3 for Series C, given information up to origin t = 20. 

We show in Chapter 7 how an estimate s 2 , of the variance a 2 , may be obtained from 
time series data. When the number of observations on which this estimate is based is, say, 
at least 50, s a may be substituted for a a and approximate 1 — e probability limits z t+l (—) 
and z t+l (+) for z t+l will be given by 

/ /-i x 1 / 2 

Z t+ i(±) = z,(l)±u £/2 ^l + Yi¥jJ s a (5.2.6) 

It follows from Table 7.6 that for Series C, s a = 0.134; hence, the 50 and 95% limits, 
for z r+2 , for example, are given by 

50% limits : z,(2) ± (0.674)(1 + 1.8 2 ) 1/2 (0.134) = z,(2) ± 0.19 
95% limits : z r (2) ± (1.960)(1 + 1.8 2 ) 1/2 (0.134) = z,{ 2) ± 0.55 

Figure 5.2 shows a section of Series C together with the several-steps-ahead forecasts 
(indicated by crosses) from origins t = 20 and t = 67. Also shown are the 50 and 95% 
probability limits for z 20+/ , for / = 1 to 14. The interpretation of the limits z f+/ (—) and 
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FIGURE 5.1 Conditional probability distributions of future values z 2l ,z 22 , and z 23 for Series C, 
given information up to origin t = 20. 



z t+ ,(+) should be noted carefully. These limits are such that given the information available 
at origin t, there is a probability of 1 — e that the actual value z t+l , when it occurs, will be 
within them, that is, 


p r{z f+/ (-) < z t+ , < z, +/ (+)} = 1 - e 

Also, the probabilities quoted apply to individual forecasts and not jointly to the forecasts 
at different lead times. For example, it is true that with 95% probability, the limits for lead 
time 10 will include the value z r+10 when it occurs. It is not true that the series can be 
expected to remain within all the limits simultaneously with this probability. 

5.2.4 Calculation of Forecasts Using R 

Forecasts of future values of a time series that follows an ARIMAI/j, d, q) can be 
calculated using R. A convenient option is to use the function sarima.for() in the 
astsa package. For example, if z represents the observed time series, the command 


1 



20 40 60 80 


t 




FIGURE 5.2 Forecasts for Series C and probability limits. 
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FIGURE 5.3 Forecasts for Series C with ±2 prediction error limits generated using R. 

sarima.for(z,n.ahead,p,d,q,no.constant=TRUE) will fit the ARIMAfj?, d, q) model 
without a constant term to the series and generate forecasts from the fitted model. The 
argument n.ahead specifies the number of forecasts to be generated. The output gives the 
forecasts and the standard errors of the forecasts, and supplies a graph of the forecasts along 
with their +/— 2 prediction error limits. Thus, forecasts up to 20 steps ahead for Series C 
based on the ARIMA(1, 1, 0) model (1 — 4>B)( 1 — B) = a, are generated as follows: 

> library(astsa) 

> seriesC=read.table("SeriesC.txt,header=TRUE) 

> ml=sarima.for(seriesC,20,l,l,0,no.constant=FALSE) 

> ml % prints output from file ml 

This code generates an output file “ml” that includes the forecasts (“pred”) and the 
prediction errors (“se”) of the forecasts. These can be accessed as ml$pred and ml$se, if 
needed for further analysis. Figure 5.3 shows a graph of the forecasts and their associated 
±2 prediction error limits for Series C. We note that the limits become wider as the lead time 
increases, reflecting the increased uncertainty due to the fact that the series is nonstationary 
and does not vary around a fixed mean level. 


5.3 FORECAST FUNCTION AND FORECAST WEIGHTS 

Forecasts are calculated most simply by direct use of the difference equation. From the 
purely computational standpoint, the other model forms are less convenient. However, 
from the point of view of studying the nature of the forecasts, it is useful to consider in 
greater detail the alternative forms discussed in Section 5.1.2 and, in particular, to consider 
the explicit form of the forecast function. 
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5.3.1 Eventual Forecast Function Determined by the Autoregressive Operator 

At time t +1, the ARIMA model may be written as 

z t+l ~ <Pl z t+l-l ~ - ( Pp+d z t+l-p-d = a t+l ~ d l a t+l-l 

- 9 q a t+ i_ q (5.3.1) 

Taking the conditional expectations at time t, we have, for / > q, 

z t(0 ~ (P\ z t (l ~ 1)- (P p +d z A l ~ P~ d ) = 0 1 > <7 (5.3.2) 

where it is understood that z t (—j) = z t _j for j > 0. This difference equation has the solution 

m = bffod) + bfMD + - + b^ +d _ x f p+d _ x (D (5.3.3) 

for 1 > q — p — d. Note that the forecast z t (l ) is the complementary function introduced 
in Chapter 4. In (5.3.3), / 0 (/),/](/),..., f p+ d-\d) are functions of the lead time /. In 
general, they could include polynomials, exponentials, sines and cosines, and products of 
these functions. The functions / 0 (/), /](/),..., f p+ d-id) consist of d polynomial terms /', 
/ = 0,... ,d — 1, of degreed — 1, associated with the nonstationary operator \ /d = (1 — B) d , 
and p damped exponential and damped sinusoidal terms of the form G l and D 1 sin(2^ fl + 
F), respectively, associated with the roots of <p(B) = 0 for the stationary autoregressive 
operator. That is, the forecast function has the form 

m = b f + bfl + - + + bff d d) + bf +1 fd+id) 

+ -+ ^+ d _i/p+d-t(0 

For instance, if 4>(B) = 0 has p distinct real roots G~ l ,, Cr 1 , then the last p terms in 
z,d) are b^G^ + b U J + ^ G l 2 + Since the operator 4>{B) is stationary, we have 

|G| < 1 and D < 1 and the last p terms in z t (l) are transient and decay to zero as / increases. 
Hence, the forecast function is dominated by the remaining polynomial terms, b']"I', 

as / increases. For a given origin t, the coefficients b^ 1 are constants applying to all lead 
times /, but they change from one origin to the next, adapting themselves appropriately 
to the particular part of the series being considered. From now on we call the function 
defined by (5.3.3) the eventual forecast function', “eventual” because when it occasionally 
happens that q > p + d, it supplies the forecasts only for lead times l > q — p — d. 

We see from (5.3.2) that it is the general autoregressive operator cp(B) that determines 
the mathematical form of the forecast function, that is, the nature of the /’s in (5.3.3). 
Specifically, it determines whether the forecast function is to be a polynomial, a mixture 
of sines and cosines, a mixture of exponentials, or a combination of these functions. 


5.3.2 Role of the Moving Average Operator in Fixing the Initial Values 

While the autoregressive operator determines the nature of the eventual forecast function, 
the moving average operator is influential in determining how that function is to be ‘ ‘fitted’ ’ 
to the data and hence how the coefficients b ^, b['\ ... ,b in , , in (5.3.3) are to be calculated 

01 p+d—l y 

and updated. 
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For example, consider the IMA(0,2, 3) process: 

z t+l ~ ^ Z t+l -1 + Z t+l-2 = a t+l ~ ~ ®2 a t+l-2 ~ @3 a t+l-3 

Taking the conditional expectation, the forecast function becomes 

z t( 1) = 2z r — z t _ [ — O^a, — d 2 a t _ l — 0 2 a t _2 
2,(2) = 2z,(l) -z t - 0 2 a t - 0 3 a t _ x 
2,(3) = 2z r (2) - z,( 1) - 0 3 a, 
z t (l ) = 2 z,(l - 1) - z,(l - 2) / > 3 


Therefore, since cp(B ) = (1 — B) 2 in this model, the eventual forecast function is the unique 
straight line 

2,(0 = C + bfl 1 > 1 

which passes through z,( 2) and z r (3) as shown in Figure 5.4. However, note that if the 0 3 
term had not been included in the model, then q — p — d = 0, and the forecast would have 
been given at all lead times by the straight line passing through z r (l) and z t (2). 

In general, since only one function of the form (5.3.3) can pass through p + d points, the 
eventual forecast function is that unique curve of the form required by cp(B), which passes 
through thep + d “pivotal” values z t (q), z t (q - 1),..., z,(q — p — d + 1), where z t (—j ) = 
z t _j (J = 0,1,2,...). In the extreme case where q = 0, so that the model is of the purely 
autoregressive form q>(B)z, = a t , the curve passes through the points z t , z r _j, ..., z t _ p _ d+3 . 
Thus, the pivotal values can consist of forecasts or of actual values of the series; they are 
indicated in the figures by circled points. 

The moving average terms help to decide the way in which we ‘ ‘reach back'' into the 
series to fit the forecast function determined by the autoregressive operator cp(B). Figure 
5.5 illustrates the situation for the model of order (1,1,3) given by (1 — (j)B)S7z t = (1 — 
0| B — Q 2 Br — d 3 B ' )a t . The (hypothetical) weight functions indicate the linear functional 
dependence of the three forecasts, z f (l), z f (2), and z,( 3), on z t , z f _ l5 z t _ 2 ,.... Since the 
forecast function contains p + d = 2 coefficients, it is uniquely determined by the forecasts 
z r (3) and z t ( 2), that is, by z.,(q) and z t {q — 1). We next consider how the forecast weight 
functions, referred to above, are determined. 



FIGURE 5.4 Eventual forecast function for an IMA(0, 2, 3) process. 
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FIGURE 5.5 Dependence offorecast function on observations for a (1, l,3)process(l — </rB)Vz, = 
(1 - 0 1 B - 6 2 B 2 - 0 3 B 3 )a r 


5.3.3 Lead / Forecast Weights 

The fact that the general model may also be written in inverted form, 

a t = n{B)z t = (1 — n l B — 7i 2 B 2 - n 3 B 3 - —)z t (5.3.4) 

allows us to write the forecast as in (5.1.21). On substituting for the conditional expectations 
in (5.1.21), we obtain 

00 

z l (l) = 2 j KjZ,(l-j) (5.3.5) 

j= i 

where, as before, z,(—h) = z t _ h for h = 0,1,2,.... Thus, in general, 

z,(l) = n x z,,{\ - 1) + + ^ ; _jz ( ( 1) + tt/Z, + n M z t _ x + (5.3.6) 

and, in particular, 

Z f (l) = K\Z. t + 7T~)Z t _i + K 2 Z t _7 + *•* 

The forecasts for higher lead times may also be expressed directly as linear functions of 
the observations z t , z t _ i, z ,_ 2 ,.... For example, the lead 2 forecast at origin t is 

z,(2) = /rjz,( 1) + k 2 z, + n 2 z t _ x + ••• 

00 00 

= k \ X + X n j+i z t+\~j 

7=1 7=1 

00 

2 ( 2 ) 

71 j z t+\-j 

7=1 
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where 

(2) . -IT 

Jij = n l n j +n j+l j = 1,2,... 
Proceeding in this way, it is readily shown that 


m = Y *?■ z ‘+i-j 

i=i 

where 


(5.3.7) 


(5.3.8) 


nf = Kj+i -1 + Yj K h* { ‘ h) j — 1 . 2 ,... ( 5 . 3 . 9 ) 

h= 1 

and 7T. 11 = Kj. Alternative methods for computing these weights are given in 
Appendix A5.2. 

As seen earlier, the jij’s themselves may be obtained explicitly by equating coefficients 
in 

9(B){ 1 - k x B - k 2 B 2 - ■■■) = cp{B ) 

Given these values, the n ^'s may readily be obtained, if so desired, using (5.3.9) or the 
results of Appendix A5.2. As an example, consider again the model 

V 2 z, = (1 -0.9B + 0.5B 2 )a, 

which was fitted to a series, a part of which is shown in Figure 5.6. Equating coefficients in 
(1 - 0.95 + 0.5£ 2 )(1 - k x B - k 2 B 2 -•••)= 1 - 2 B + B 2 



FIGURE 5.6 Part of a series fitted by V 2 z, = (1 - 0.9 B + 0.5 B 2 )a t with forecast function for origin 
t = 30, forecast weights, and probability limits. 
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TABLE 5.2 k Weights for the Model 
\ 2 z, = (1 - 0.9 B + 0.5 B 2 )a, 


j 

(i) 

n i 

(2) 

*J 

1 

1.100 

1.700 

2 

0.490 

0.430 

3 

-0.109 

-0.463 

4 

-0.343 

-0.632 

5 

-0.254 

-0.336 

6 

-0.057 

0.013 

7 

0.076 

0.181 

8 

0.097 

0.156 

9 

0.049 

0.050 

10 

-0.004 

-0.032 

11 

-0.028 

-0.054 

12 

-0.023 

-0.026 


yields the weights jij = n . 1 - 1 , from which the weights kJ ] may be computed using (5.3.7). 
The two sets of weights are given for j = 1,2,..., 12 in Table 5.2. In this example, the lead 
1 and lead 2 forecasts, expressed in terms of the observations z t , z,_ j,..., are 

z t ( 1) = l.IOz, 4- 0.49 z,_ 1 - 0.1 lz t _2 - 0.34z r _ 3 - 0.25z f _ 4 - ••• 


and 


z r (2) = 1.70 z. t + 0.43z r _! - 0.46z^_ 2 - 0.63z r _ 3 - 0.34z f _ 4 + ••• 
In fact, the weights follow damped sine waves as shown in Figure 5.6. 


5.4 EXAMPLES OF FORECAST FUNCTIONS AND THEIR UPDATING 

The forecast functions for some special cases of the general ARIMA model will now 
be considered. We exhibit these in the three forms discussed in Section 5.1.2. While the 
forecasts are most easily computed from the difference equation itself, the other forms 
provide insight into the nature of the forecast function in particular cases. 

5.4.1 Forecasting an IMA(0,1,1) Process 

Difference Equation Approach. We first consider the model Vz ( = (1 — 6B)a t . At time 
t + /, the model may be written as 

z t+l = z t+l -1 4- a t+ i — 0a t+ i_ i 

Taking conditional expectations at origin t yields 


z r (l) = z, — da, 

m = w - 1 ) i > 2 


(5.4.1) 
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Hence, for all lead times, the forecasts at origin t will follow a straight line parallel to the 
time axis. Using the fact that z t = z r _j(l) + a t , we can write (5.4.1) in either of two useful 
forms. 

The first of these is 


z t {l) = z t _,(l) 4- A a, (5.4.2) 

where A = l — 9. This form is identical to the general updating form (5.2.5) for this model, 
since i = A and z,_ l (/ + 1) = z t _\ (/) for all / > 1. This form implies that having seen 
that our previous forecast z t _fl) falls short of the realized value by a,. we adjust it by an 
amount Aa t . It will be recalled from Section 4.3.1 that A measures the proportion of any 
given shock a t , which is permanently absorbed by the “level” of the process. Therefore, it 
is reasonable to increase the forecast by that part Aa t of a n which we expect to be absorbed. 

The second way of rewriting (5.4.1) is to write a t = z, — z r _j(l) = z t — z t _ x (l) in (5.4.2) 
to obtain 


z. t {l) = Az t + {\- X)z t _ x {l) (5.4.3) 

This form implies that the new forecast is a linear interpolation at argument A between old 
forecast and new observation. Thus, if A is very small, we rely principally on a weighted 
average of past data and heavily discounting the new observation z t . By contrast, if A = 1 
(6 = 0), the evidence of past data is completely ignored, z t (l) = z f , and the forecast for 
all future time is the current value. With A > 1, we induce an extrapolation rather than an 
interpolation between z t _ x {l) and z t . The forecast error must now be magnified in (5.4.2) 
to indicate the change in the forecast. 

Forecast Function in Integrated Form. The eventual forecast function is the solution of 
(1 — B)z t (I) = 0. Thus, z t (l) = b^\ and since q — p — d = 0, it provides the forecast for all 
lead times, that is, 

z,(l) = bf 1 > 0 (5.4.4) 

For any fixed origin, b^ is a constant, and the forecasts for all lead times will follow a 

straight line parallel to the time axis. However, the coefficient b ^ will be updated as a 
new observation becomes available and the origin advances. Thus, the forecast function 
can be thought of as a polynomial of degree zero in the lead time /, with a coefficient that 
is adaptive with respect to the origin 1. 

A comparison of (5.4.4) with (5.4.1) shows that 

C = z t 0) = Z., - 9a, 

Equivalently, by referring to (4.3.4), since the truncated integrated form of the model, 
relative to an initial origin k, is 

z, = AS,_ k _ x a t _ x +a t + (z k - 9a k ) 

= A(a t _ x + ••• + a k+l ) + a t + (z k - 9a k ) 


it follows that 
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Also, i //j = A(J = 1,2,...) and hence the adaptive coefficient b^ } can be updated from 
origin 1 to origin t + 1 according to 

= b o + Aa >+1 (5.4.5) 

similar to (5.4.2). 

Forecast as a Weighted Average of Previous Observations. Since, for this process, the 
nj* weights of (5.3.8) are also the weights for the one-step-ahead forecast, we can also 
write, using (4.3.6), 

z. t {l) = bf = Az, + 4(1 - A)z t _ l + 4(1 - A) 2 z,_ 2 + - (5.4.6) 

Thus, for the IMA(0,1,1) model, the forecast for all future time is an exponentially weighted 
moving average of current and past z’s. 

Example: Forecasting Series A. It will be shown in Chapter 7 that Series A is closely 
fitted by the model 


(1 — B)z t = (1 - 0.7B)a t 

In Figure 5.7, the forecasts at origins t = 39,40,41,42, and 43 and also at origin 1 = 79 are 
shown for lead times 1,2,..., 20. The weights jcj, which for this model are forecast weights 
for any lead time, are given in Table 5.3. These weights are shown diagrammatically in 
their appropriate positions for the forecast z 39 (/) in Figure 5.7. 

Variance Functions. Since for this model, y/j = A(j = 1,2,...), the expression (5.1.16) 
for the variance of the lead l forecast errors is 

V(l) = a 2 a [ 1 + (/ - 1)4 2 ] (5.4.7) 

Using the estimate s 2 = 0.101, appropriate for Series A, in (5.4.7). 50 and 95% proba¬ 
bility limits were calculated and are shown in Figure 5.7 for origin t = 79. 



FIGURE 5.7 Part of Series A with forecasts at origins t = 39,40.41,42,43 and at t = 79. 
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TABLE 5.3 Forecast Weights Applied to Previous z’s for Any Lead 
Time Used in Forecasting Series A with Model Vz, = (1 — 0.7 B)a, 


j 

*1 

i 


1 

0.300 

7 

0.035 

2 

0.210 

8 

0.025 

3 

0.147 

9 

0.017 

4 

0.103 

10 

0.012 

5 

0.072 

11 

0.008 

6 

0.050 

12 

0.006 


5.4.2 Forecasting an IMA(0,2,2) Process 

Difference Equation Approach. We now consider the model V 2 z, = (1 — 0 l B — 6 2 B 2 )a t . 
At time t + 1, the model may be written as 

z r+l = ^ z t+l -1 - z t+l—2 + a t+l ~ @l a t+l-l ~ ^2 a t+l-2 

On taking conditional expectations at time t, we obtain 

z r (l) = 2z f — z,_i — 6 x a, — 0 2 a t _ 1 

z r (2) = 2z f (l )-z t -0 2 a t 

z t (l ) = 2 zfl - 1) - z,(/ - 2) / > 3 

from which the forecasts may be calculated. Forecasting of the series of Figure 5.6 in this 
way was illustrated in Section (5.1.2). An alternative way of generating the first L — 1 of 
L forecasts is via the updating formula (5.2.5), 

z t+l (l) = z,(l + l) + ii/,a t+l (5.4.8) 

The truncated integrated model, as in (4.3.15), is 

z t = Wt-k^t-i + + a, + bf + b\ k \t - k) (5.4.9) 

where 4 0 = 1 + d 2 and A 1 = 1 — 0 t — 0 2 , so that y/j = 1 0 + j^i 0 = 1.2,...). Therefore, 
the updating function for this model is 

MO = ZfO + 1) + (Ao + /4j)a f+1 (5.4.10) 


Forecast in Integrated Form. The eventual forecast function is the solution of (1 — 
B) 2 z,(I) = 0, that is, z t (l ) = b <2> + b^l. Since q — p — d = 0, the eventual forecast function 
provides the forecast for all lead times, that is, 

z,(l) = b®+bfl l > 0 (5.4.11) 

Thus, the forecast function is a linear function of the lead time /, with coefficients that are 
adaptive with respect to the origin t. The stochastic model in truncated integrated form is 

z t+i = ^t+i-k-\ a t+i-i + hS^i_ k _x a t+i-i + a t+i + + b (k \t + 1 - k) 
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and taking expectations at origin t, we obtain 


z t (l) — AoS t _ k a t + A\(la t + (/ + l)o,_i + ••• + (/ + t — k — l)a k+x ) 

+ bf + bf\t + l-k ) 

= [A 0 S,_ k a t + A x S ( ^ k _ ia ,_ x + bf + b\ k) (t - k)] + (A l S,_ k a, + b\ k) )l 

The adaptive coefficients may thus be identified as 

bf = ^S,_ k a, + A x S^ k _ x a t _ x + bf + b\ k \t - k) 

bf = A x S t _ k a,+b (k) (5.4.12) 

or informally based on the infinite integrated form as bf = A 0 Sa t + A x S 2 a t _ x and b <2> = 
A x Sa t . Hence, their updating formulas are 


b 

b 


W 

o 

(0 

I 


- b Q n + 6j f U + A 0 a t 
= b^ 11 + A x a, 


(5.4.13) 


similar to relations (4.3.17). The additional slope term b ( ' ", which occurs in the updating 

formula for b^'\ is an adjustment to change the location parameter b {) to a value appropriate 
to the new origin. It will also be noted that A Q and A x are the fractions of the shock a,, which 
are transmitted to the location parameter and the slope parameter, respectively. 


Forecasts as a Weighted Average of Previous Observations. For this model, then, the 
forecast function is a straight line that passes through the forecasts z,( 1) and z r (2). This 
is illustrated for the series in Figure 5.6, which shows the forecasts made at origin t = 30, 
with appropriate weight functions. It will be seen how dependence of the entire forecast 
function on previous z’s in the series is a reflection of the dependence of z r (l) and z r (2) 
on these values. The weight functions for z t (l) and z r (2), plotted in the figure, have been 
given in Table 5.2. 

The example illustrates once more that while the AR operator cp(B) determines the form 
of function to be used (a straight line in this case), the MA operator is of importance in 
determining the way in which that function is “fitted” to previous data. 

Dependence of the Adaptive Coefficients in the Forecast Function on Previous z. ’s. Since 
for the general model, the values of the adaptive coefficients in the forecast function are 
determined by z t (q), z t (q — 1),..., z t (q — p — d + 1), which can be expressed as functions 
of the observations, it follows that the same is true for the adaptive coefficients themselves. 

For instance, in the case of the model V 2 z, = (1 — 0.9 B + 0.5 B 2 )a t of Figure 5.6, 

OO 

® m _ ft) At) _ V JX), 
t-M) - b {) + 0, - 2j 71 j z t+i-j 
j= 1 

OO 

z r (2) = bf + 2bf = Yj xf z ,+\-j 
7 = 1 
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FIGURE 5.8 Weights applied to previous z’s determining location and slope for the model V 2 z, = 
(1 -0.9B + 0.5B 2 )a,. 

so that 

00 

bf = 2z ( (l) - z,(2) = £(2 nf - nf )z, +w 


bf = 2,(2) - z,(l) = Yj( n T ~ 
j= i 

These weight functions are plotted in Figure 5.8. 

Variance of the Forecast Error. Using (5.1.16) and the fact that i// ; = A 0 + jA lt the vari¬ 
ance of the lead l forecast error is 

V(l) = C 2 a [l + (1 - 1 )A 2 0 + i/(/ - mi - m 2 ! + Wil - 1)] (5.4.14) 

Using the estimate s 2 = 0.032, A 0 = 0.5, and 4] = 0.6, the 50 and 95% limits are shown in 
Figure 5.6 for the forecast at origin t = 30. 

5.4.3 Forecasting a General IMA(0, d, q) Process 

As an example, consider the process of order (0, 1,3): 

(1 - B)z t+l = (1 - 6 X B - 0 2 B 2 - e^)a t+x 

Taking conditional expectations at time t , we obtain 

2 ,( 1 ) ~ z i = —9\ a t ~ ®2 a t-\ ~ ®3 a t-2 
2,(2) - z,(l) = -0 2 a t - 0 3 a,_ l 
2,(3) - z,(2) = -0 3 a t 
z. t {l) - zfl - 1) = 0 1 =4,5,6,... 
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FIGURE 5.9 Forecast function for an IMA(0,1,3) process. 


Hence, z,(l ) = z,(3) = for all / > 2, as expected, since q — p — d = 2. As shown in 
Figure 5.9, the forecast function makes two initial “jumps,” depending on previous a’s, 
before leveling out to the eventual forecast function. 

For the IMA(0, d, q) process, the eventual forecast function satisfies the difference 
equation (1 — B) d z t (l) = 0, and has for its solution, a polynomial in 1 of degree d — 1: 

z,(l) = b® + bfl + bfl 2 + - + b^_/~ l 

This will provide the forecasts z,(l) for / — q — d. The coefficients b^\ b^,..., b^_ l must be 
updated progressively as the origin advances. The forecast for origin t will make q — d initial 
“jumps,” which depend on a t , a t _ j,..., a t _ q+l , and after this, will follow the polynomial 
above. 

5.4.4 Forecasting Autoregressive Processes 

Consider a process of order (p,d, 0), cp(B)z, = ci t . The eventual forecast function is the 
solution of cp(B)z,(l ) = 0. It applies for all lead times and passes through the last p + d 
available values of the series. For example, the model for the IBM stock series (Series B) 
is very nearly 


(1 - B)z, = a, 


so that 


z r (/) » z, 

The best forecast for all future time is very nearly the current value of the stock. The weight 
function for z t (l) is a spike at time t and there is no averaging over past history. 

Stationary Autoregressive Models. The stationary AR(p) process </;( B )z t = a t will in gen¬ 
eral produce a forecast function that is a mixture of exponentials and damped sines. In 
particular, for p = 1, the model 

(1 — 4>B)z, = a, — 1 < 0 < 1 

has a forecast function that, for all / > 0, is the solution of (1 - 4>B)z,(l) = 0. Thus, 

2,(1) = 1 > 0 


(5.4.15) 
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FIGURE 5.10 Forecast functions for (a) the AR(1) process (1 — 0.5 B)z t = a t , and (b) the AR(2) 
process (1 - 0.15B + 0.5 B 2 )z t = a t from a time origin t = 14. 


Also, z f (l) = <fiz t , so that b^' 1 = z t and 

!/(0 = 

So, the forecasts for the original process z, are z,(l) = B + 4>\z t — b)- 

Hence, the minimum mean square error forecast predicts the current deviation from 
the mean decaying exponentially to zero. In Figure 5.10(a) a time series is shown that is 
generated from the process (1 — 0.5 B)z t = a t , with the forecast function at origin t = 14. 
The course of this function is seen to be determined entirely by the single deviation 
z 14 . Similarly, the minimum mean square error forecast for a second-order autoregressive 
process is such that the current deviation from the mean is predicted to decay to zero via 
a damped sine wave or a mixture of two exponentials. Figure 5.10(b) shows a time series 
generated from the process (1 — 0.75 B + 0.50 B 2 )z t = a r and the forecast at origin t = 14. 
Here the course of the forecast function at origin t is determined entirely by the last two 
deviations, z 14 and z 13 . 

Variance Function for the Forecast from an AR(1) Process. Since the AR(1) process at 
time t + 1 may be written as 

z t+ i = a t+l + fa t+ i_ l + ••• + f' 1 a t+l + f'z t 

it follows from (5.4.15) that 

e,(l) = z l+l - z,(l) = a t+l + </>o r+; _] + + f l ~ X a t+ , 
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Hence, 


V(l) = var[e,(/)] = ff“(l + 0 2 + ••• + 0 2(/ J) ) 

= <* 1(1 ~ 0 2/ ) 

1 — 0 2 


(5.4.16) 


We see that for this stationary process, as / tends to infinity the variance increases to a 
constant value y 0 = ff 2 /(l — 0 2 ), associated with the variation of the process about the 
ultimate forecast p. This is in contrast to the behavior of forecast variance functions for 
nonstationary models that "blow up” for large lead times. 


Nonstationary Autoregressive Models of Order (p, d, 0). For the model 

f(B)¥ d z t = a t 

the rfth difference of the process decays back to its mean when projected several steps 
ahead. The mean of \ /d z t will usually be assumed to be zero unless contrary evidence is 
available. When needed, it is possible to introduce a nonzero mean by replacing S7 d z t by 
the deviation (V d z ( — p w ) in the model. For example, consider the model 


(1 -4>B)(Wz t -/i w ) = a t (5.4.17) 

After substituting t + j for t and taking conditional expectations at origin t, we readily 
obtain [compare with (5.4.15) et seq.] 

z,U) ~ z,(J - 1 ) - p w = f ] {z t - z,_i - p w ) 

or w t (j) — p w = f J (w, — p w ), where w t = Vz ( . This shows how the forecasted difference 
decays exponentially from the initial value w t = z t — z t _ { to its mean value p w . On sum¬ 
ming this expression from j = 1 to j = /, that is, using z t (l) = iv t (l) + •■■ + w t { 1) + z t , we 
obtain the forecast function 


zfl) = Z, + p w l + (z, - Z r _! 


B w ) 


0d ~ f') 
1-0 


/ > 1 


that approaches asymptotically the straight line 


/(/) - z t + p w l + (z, - z r _j - p w ) 


0 

1-0 


with deterministic slope p w . If the forecasts are generated using the function sarima.for() 
in the astsa package in R, a deterministic slope can be incorporated into the forecast 
function by setting the argument no.COnstant=FALSE. The treatment of the constant term 
can have a big impact on the forecasts and should be considered carefully when a possible 
trend might be present. 

We now consider the forecasting of some important mixed models. 
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5.4.5 Forecasting a (1,0,1) Process 

Difference Equation Approach. Consider the stationary model 

(1 - $B)2, = (1 - 0B)a t 
The forecasts are readily obtained from 

f r (l) = fz t — 0a t 

z,(l) = </>!,(/ - 1) l >2 


(5.4.18) 


The forecasts decay geometrically to the mean, as in the first-order autoregressive process, 
but with a lead 1 forecast modified by a factor depending on a t = z t — z t _f \). The i// 
weights are 

Wj = (4> - J = 1 , 2 ,... 

and hence, using (5.2.5), the updated forecasts for lead times 1,2,... ,L — 1 could be 
obtained from previous forecasts for lead times 2,3,.... L according to 

z t+ i(l) = z,(l + !) + ($- 0)</>' _1 o, + 1 


Integrated Form. The eventual forecast function for all / > 0 is the solution of (1 — 
f>B)2,(l ) = 0, that is, 

2,(1) = l > 0 

However, 

!,(/) = b { Q f = fz, - Qa t = (l “ ^ l t + ^!/-i(l) 4> 

Thus, 

2,(1)= (l-D^ + i^td) ( l } ' (5.4.19) 

Hence, the forecasted deviation at lead / decays exponentially from an initial value, which 
is a linear interpolation between the previous lead 1 forecasted deviation and the current 
deviation. When <fi is equal to unity, the forecast for all lead times becomes the familiar 
exponentially weighted moving average and (5.4.19) becomes equal to (5.4.3). 


Weights Applied to Previous Observations. The n weights, and hence the weights applied 
to previous observations to obtain the lead 1 forecasts, as 

*j = (4> - d)#- 1 j = 1 , 2 ,... 

Note that the weights for this stationary process sum to (</> — 0)/(l — 0) and not to unity. 
If f were equal to 1, the process would become a nonstationary IMA(0,1,1) process, 
the weights would then sum to unity, and the behavior of the generated series would be 
independent of the level of z t . 

For example, Series A is later fitted to a (1,0,1) model with <fi = 0.9 and 6 = 0.6, 
and hence the weights are ji 1 = 0.30, n 2 = 0.18, 7t 3 = 0.11, 7t 4 = 0.07,..., which sum to 
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0.75. The forecasts (5.4.19) decay very slowly to the mean, and for short lead times are 
practically indistinguishable from the forecasts obtained from the alternative IMA(0,1,1) 
model Vz, = ci t — 0.7a t _i, for which the weights are n x = 0.30, 7t 2 ,= 0.21, /r 3 = 0.15, 
7r 4 = 0.10, and so on, and sum to unity. The latter model has the advantage that it does not 
tie the process to a fixed mean. 

Variance Function. Since the i// weights are given by 

Wj = (0 - 0)#-' j = 1,2,... 
it follows that the variance function is 

V(l) = c 2 a l + (0-fl) 2 J , (5.4.20) 

which increases asymptotically to the value <r 2 (l — 2(f>6 + 0 2 )/(l — 0 2 ), the variance y 0 of 
the process. 


5.4.6 Forecasting a (1,1,1) Process 

Another important mixed model is the nonstationary (1,1,1) process: 

(1 - 4>B)( 1 - B)z, = (1 - 0B)a, 


Difference Equation Approach. At time t + 1, the model may be written 

z t+ 1 = (1 + f)z t+ i_i - 4>z t+l _ 2 + a t+ i - 0a t+l _ x 

On taking conditional expectations, we obtain 

z r (l) = (1 + f)z t - fz,_ x - 0a, 

z,(l) = (1 + <f>)z t (l - 1) - 4>z t (l - 2) l> 1 


(5.4.21) 


Integrated Form. Since q < p + d, theeventual forecast function for all / > 0 is the solution 
of (1 — 4>B)( 1 — B)z t (l ) = 0, which is 

m = bf + bft 1 

Substituting for z t (l) and z t (2) in (5.4.21), we find explicitly that 


C = z t + 


f , \ 6 


At) = 0a t ~ <!>(*, ~ -Vl) 
1 1-0 
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Thus, finally, 

1 — ch 1 1 — ch 1 

z t {l) = z, + <f>-~^(z t - z t _0 - C5.4-.22) 

It is evident that for large /, the forecast tends to b^\ 


Weights Applied to Previous Observations. Eliminating a t from (5.4.22), we obtain the 
alternative form for the forecast in terms of previous z’s: 


z t d) 



e — 4> 


(1 - 0 ') 


z, + 


e — (j) 


(l - <p l ) 


z,-i(9) 


(5.4.23) 


where z r _| (0) is an exponentially weighted moving average with parameter 9, that is, 
z,_ l (9) = (1 — 9) 9-i~ l z t _j. Thus, the n weights for the process consist of a “spike” 
at time t and an EWMA starting at time t — \. If we refer to (1 — a)x + ay as a linear 
interpolation between x and y at argument a, the forecast (5.4.23) is a linear interpolation 
between z and z t _i(9). The argument for lead time 1 is 9 — <p, but as the lead time 
is increased, the argument approaches (9 — </>)/( 1 — </>). For example, when 9 = 0.9 and 
</> = 0.5, the lead 1 forecast is 


z r (l) = 0.6z, + 0 .4z t _ l (9) 
and for long lead times, the forecast approaches 

z f (oo) = 0.2z, + O.8z r _|(0) 


5.5 USE OF STATE-SPACE MODEL FORMULATION FOR EXACT 
FORECASTING 

5.5.1 State-Space Model Representation for the ARIMA Process 

The use of state-space models for time series analysis began with the work of Kalman 
(1960) and many of the early developments took place in the field of engineering. These 
models consist of a state equation that describes the evolution of a dynamic system in time, 
and a measurement equation that represents the observations as linear combinations of the 
unobserved state variable corrupted by additive noise. In engineering applications, the state 
variable generally represents a well-defined set of physical variables, but these variables 
are not directly observable, and the state equation represents the dynamics that govern the 
system. In statistical applications, the state-space model is a convenient form to represent 
many types of models, including autoregressive-moving average (ARMA) models, struc¬ 
tural component models of “signal-plus-noise” form, or time-varying parameter models. 
In the literature, state-space models have been used for forecasting, maximum likelihood 
estimation of parameters, signal extraction, seasonal adjustments, and other applications 
(see, for example, Durbin and Koopman, 2012). In this section, we introduce the state-space 
form of an ARIMA model and discuss its use in exact finite sample forecasting. Other ap¬ 
plications involving the use of state-space models for likelihood calculations, estimation 
of structural components, treatment of missing values, and applications related to vector 
ARMA models will be discussed in Sections 7.4, 9.4, 13.3, and 14.6. 
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For an ARIMA(p, d, q) process cp( B)z t = ()(B)a v define the forecasts z t (J) = E t [z t + j] 
as in Section 5.1, for j = 0,1,..., r, with r = max(p + d, q + 1), and z r (0) = z t . From the 
updating equations (5.2.5), we have z t (j — 1) = z t _\ (j) + j = 1,21. Also 

for j = r > q, recall from (5.3.2) that 

p+d 

z,(j - 1) = + Wj-\a t = X - 0 + Vj-ia, 

i= 1 

So we define the “state” vector at time t, Y t , with r components as Y t = 
(z t , z r (l), ..., z t (r — 1 ))'. Then from the relations above, we find that the vector Y t sat¬ 
isfies the first-order system of equations: 







1 

"o 1 

0 • 

• o" 



V\ 

0 0 

1 • 

• 0 

y, 

-1 + 


0 0 


• 1 




(p r (p r -\ 


• <p 1 



Vr- 1 







where <p, = 0 if i > p + d. So we have 

Y t = t&Yj.! + T a, (5.5.2) 

together with the observation equation 

Z, = z, + N, = [1,0,..., 0 ]Y t + N t = HY, + N t (5.5.3) 

where the additional noise N t would be present only if the process z t is observed subject to 
additional white noise; otherwise, we simply have z t = HY f . The last two equations above 
constitute what is known as a state-space representation of the model, which consists of a 
state or transition equation (5.5.2) and an observation equation (5.5.3), and Y , is known as 
the state vector. We note that there are many other constructions of the state vector Y t that 
will give rise to state-space equations of the general form of (5.5.2) and (5.5.3); that is, the 
state-space form of an ARIMA model is not unique. The two equations of the form above, 
in general, represent what is known as a state-space model, with unobservable state vector 
Y, and observations Z t , and can arise in time series settings more general than the context 
of ARIMA models. 

Consider a state-space model of a slightly more general form, with state equation 

Y, = ^Y,.! + a, (5.5.4) 

and observation equation 

Z, = H ,Y, + N t (5.5.5) 

where it is assumed that a t and N t are independent white noise processes, a t is a vector 
white noise process with covariance matrix £ a , and N t has variance a 1 . In this model, the 
(unobservable) state vector Y, summarizes the state of the dynamic system through 
time t , and the state equation (5.5.4) describes the evolution of the dynamic system in time, 
while the measurement equation (5.5.5) indicates that the observations Z, consist of linear 
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combinations of the state variables corrupted by additive white noise. The matrix <J>, in 
(5.5.4) is an r X r transition matrix and H, in (5.5.5) is a 1 X r vector, which are allowed to 
vary with time t. Often, in applications these are constant matrices, <I> ( = <I> and H f = H 
for all t, that do not depend on t, as in the state-space form (5.5.2) and (5.5.3) of the 
ARIMA model. In this case, the system or model is said to be time invariant. The minimal 
dimension r of the state vector Y, in a state-space model needs to be sufficiently large so 
that the dynamics of the system can be represented by the simple Markovian (first-order) 
structure as in (5.5.4). 


5.5.2 Kalman Filtering Relations for Use in Prediction 

For the general state-space model (5.5.4) and (5.5.5), define the finite sample optimal 
(minimum mean square error matrix) estimate of the state vector Y l+t based on observations 
Z f ,..., Z l over the finite past time period, as 

Y t +i\t = E[Y t+ i\Z ,,..., Zj] 


with 


= E[(Y t+I - Y t+llt )(Y t+l - Y t+nt )'] 

equal to the error covariance matrix. A convenient computational procedure, known as the 
Kalman filter equations, is then available to obtain the current estimate Y t \ t , in particular. 
It is known that, starting from some appropriate initial values Y 0 = Y 0 | 0 and V 0 = V 0 | 0 , 
the optimal filtered estimate, Y ,|,, is given through the following recursive relations: 

y t \t = Vi+M z i- H iVi) (5.5.6) 

where 

K r = V <| ,_ 1 Hj[H ( V (|f _ 1 H; + a 2 N T l (5.5.7) 

with 

Y,\,-i = (5.5.8) 

and 

v^ti-K^rvvj 

= v r | r _] - v (|f _ 1 H;[H t v (|f _ 1 H; + c 2 N r l n,v t \ t -x (5.5.9) 

for t = 1,2,.... 

In (5.5.6), the quantity a t | f-1 = Z t — = Z t — Z t |,_ x is called the (finite sample) 

innovation at time f, because it is the new information provided by the measurement Z, 
that was not available from the previous observed (finite) history of the system. The 
factor K, is called the Kalman gain matrix. The filtering procedure in (5.5.6) has the 
recursive ‘ ‘prediction-correction’ ’ or ‘ ‘updating’ ’ form, and the validity of these equations 
as representing the minimum mean square error predictor can readily be verified through 
the principles of updating. For example, verification of (5.5.6) follows from the principle, 
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for linear prediction, that 

E[Y,\Z„ ...,Z 1 ] = E[Y,\Z t - Z t{l _ u Z t _i ,..., ZJ 

= E[Y t \Z,_ u ..., Zj] + E[Y t \Z t - Z tV _ x -\ 

since a, | t _j = Z, — Z t \ t _i is independent of Z f _ lf .... Zj. From (5.5.6), it is seen that the 
estimate of Y, based on observations through time t equals the prediction of Y t from obser¬ 
vations through time t — 1 updated by the factor K, times the innovation Equation 

(5.5.7) indicates that K t can be interpreted as the regression coefficients of Y t on the inno¬ 
vation o,| ( _|, with var^i,^] = H ( V r | ( _| H' + a 2 N and cov[Y,, \ = following 

directly from (5.5.5) since a t | f _j = H t (Z t — Z l ^ t _ l ) + N t . Thus, the general updating rela¬ 
tion is 

Y t \t = Y t \t-i + cov t y o o,|,-i]{var[o,|,_ 1 ]}- 1 o rk _ 1 

where a t | f-1 = Z t — Z t j t _ 1 , and the relation in (5.5.9) is the usual updating of the error 
covariance matrix to account for the new information available from the innovation a t | f-1 , 
while the prediction relations (5.5.8) follow directly from (5.5.4). 

In general, forecasts of future state values are available directly as Y t+t | f = (1> I+I Y l+l _\ | r 
for I = 1,2,..., with the covariance matrix of the forecast errors generated recursively 
essentially through (5.5.8) as 


Vt+i\t ~ ^t+i^t+i-i^'t+i + 

Finally, forecasts of future observations, Z t+l = \\ t+t Y I+l + N t+h are then available as 
Z t+ i |, = H t+ iY l+ i\ t with forecast error variance 

v t+l\t = E[(Z t+ i — Z t+/ \ t ) ] = H r+/ V r+/ | f H f+/ + a N 

Use for Exact Forecasting in ARIMA Models. For ARIMA models, with state-space 
representation (5.5.2) and (5.5.3) and Z t = z t = HY f with H = [1,0,... ,0], the Kalman 
filtering procedure constitutes an alternative method to obtain exact finite sample fore¬ 
casts, based on data z f , z t _\ ,..., Zj, for future values in the ARIMA process, subject to 
specification of appropriate initial conditions to use in (5.5.6) to (5.5.9). For stationary 
zero-mean processes z t , the appropriate initial values are Y 0 |o = 0, a vector of zeros, and 
V 0 | 0 = cov[Y 0 ] = V„, the covariance matrix of Y 0 , which can easily be determined under 
stationarity through the definition of Y r Specifically, since the state vector Y, follows 
the stationary vector AR(1) model Y t = <I>Y r | + To r , its covariance matrix V* = cov[YJ 
satisfies = fI»V fI»' + which can be readily solved for V*. For nonstationary 

ARIMA processes, additional assumptions need to be specified (see, for example, Ansley 
and Kohn (1985) and Bell and Hillmer (1987)). 

The forecasts of the ARIMA process z f are obtained recursively as indicated above, 
with /-step-ahead forecast z r+/ ^ = HY f+/ |,, the first element of the vector Y , +t where 

Yt+i\t = ^t+i-\\t 

with forecast error variance v t+ ^ t = IIV (+/ | r II'. The “steady-state” values of the Kalman 
filtering procedure /-step-ahead forecasts z f+/ |, and their forecast error variances v t+ ^ t . 
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which are rapidly approached as t increases, will be identical to the expressions given in 
Sections 5.1 and 5.2, z,(l) and V(l) = tr“( 1 + v])- 

In particular, for the ARIMA process in state-space form, we can obtain the exact (finite 
sample) one-step-ahead forecasts: 

z ? |,_i = E[z,\z,_ u ..., Zrl = 

and their error variances v t = IIV' ( r l II r , conveniently through the Kalman filtering equa¬ 
tions (5.5.6)-(5.5.9). This can be particularly useful for evaluation of the likelihood func¬ 
tion, based on n observations Zj, ..., z„ from the ARIMA process, applied to the problem 
of maximum likelihood estimation of model parameters (see, for example, Jones (1980) 
and Gardner et al. (1980)). This will be discussed again in Section 7.4. 


Innovations Form of State-Space Model and Steady State for Time-Invariant Models. 
One particular alternative form of the general state variable model, referred to as the 
innovations or prediction error representation, is worth noting. If we set Y* = and 

a* = a t | r _ x = Z t — H,y,| ( _|, then from (5.5.6) and (5.5.8) we have 


n +l = <*>,+! y; + <&, +/ k , a * = d> r+l r; 


+ X a , 


and Z r = H t Y* + a* 


which is also of the general form of a state-space model but with the same white noise 
process a* (the one-step-ahead prediction errors) involved in both the transition and obser¬ 
vation equations. 

In the “stationary case” (i.e., time-invariant and stable case) of the state-space model, 
where <t> f = <J> and H f = H in (5.5.4) and (5.5.5) are constant matrices and <I> has all eigen¬ 
values less than 1 in absolute value, we can obtain the steady-state form of the innovations 
representation by setting Y* = E\Y t \Z t _ lt Z t _ 2 ,...], the projection of Y, based on the 
infinite past of { Z t }. In this case, in the Kalman filter relations (5.5.7) to (5.5.9), the error 
covariance matrix V f+1 |, approaches the steady-state matrix V = V f+1 |, as t -> oo, 

which satisfies 


v = d>vd>' - <i>vn'[Hvn' + o-^r'HVO' + 

Then, also, the Kalman gain matrix K ( in (5.5.7) approaches the steady-state ma¬ 
trix, K, K, where K = VH'[HVH' + <t 2 n Y 1 , a* = a At _ x tends to a, = Z t - HK* = 
Z t — E[Z l \Z t _ l , Z t _ 2 ,...], the one-step-ahead prediction errors, and = var| 'a t \ t _{] -> 

o 2 a = var[oJ, where (7“ = HVH 7 + o 2 N , as t -> oo. These steady-state filtering results for 
the time-invariant model case also hold under slightly weaker conditions than stability 
of the transition matrix <I> (e.g., Harvey (1989), Section 3.3), such as in the nonstation¬ 
ary random walk plus noise model discussed in the example of Section 5.5.3. Hence, in 
the time-invariant situation, the state variable model can be expressed in the steady-state 
innovation or prediction error form as 

Y* +l = «J>y f * + <DK a, = 0>y f * + a, and Z, = H Y* + a t (5.5.10) 

In particular, for the ARIMA process cp(B)z, = 9(B)a t with no additional observa¬ 
tion error so that Z, = z t , a prediction error form (5.5.10) of the state-space model can 
be given with state vector y* +1 = (z r (l),..., z t (r*))' of dimension r* = max(p + d, q ), 
V P* = (y / l ,.... y/ rif Y, and observation equation z, = z t _f 1) + a r For example, consider 
the ARMA(1,1) process (1 — 4>B)z t = (1 — QB)a r In addition to the state-space form 
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with state equation given by (5.5.1) and Y, = (z t , z t ( 1)/, we have the innovations form 
of its state-space representation simply as z,( 1) = <fiz t _ i(l) + ys*a, and z t = z*_i(l) + a t , 
or Y* = <pY* + i \/*a t and z, = Y* + a, with the (single) state variable Y* = z t ( 1) and 

v* = Vi = 4> - o. 

5.5.3 Smoothing Relations in the State Variable Model 

Another problem of interest within the state variable model framework, particularly in 
applications to economics and business, is to obtain “smoothed” estimates of past values 
of the state vector Y t given the observations Z u , Z n through some fixed time n. One 
convenient method to obtain the desired estimates, known as the fixed-interval smoothing 
algorithm, makes use of the Kalman filter estimates Y t \ t obtainable through (5.5.6)—(5.5.9). 
The smoothing algorithm produces the minimum MSE estimator (predictor) of the state 
value Y t given the observations through time n, Y t |„ = E[Y t \Z l ,... ,Z n ]. In general, 
define Y t \ T = E[Y t \Z x ,..., Z T ] and V t \ T = E[(Y t — Y t \ T )(Y t — Y We assume that 
the filtered estimates Y ,| r and their error covariance matrices V t \ t , for t = 1,... ,n, have 
already been obtained by the Kalman filter equations. Then, the optimal smoothed estimates 
are obtained by the (backward) recursive relations, in which the filtered estimate Y t \ t is 
updated, as 

Y t \ n = Y t \ t +A t (Y t+ll „-Y t+llt ) (5.5.11) 

where 

A - = v,„o; +/ |,v- 1 1| , = cov[F,.y, +1 -y r+1|r ]{cov[y, +1 ( 5 . 5 . 12 ) 

and 

V,|„ = V t|f - A ( (V (+1| , - V (+1 |b )A; (5.5.13) 

The result (5.5.11) is established from the following argument. First, consider u, = 
E[Y t \Z l ,...,Z t ,Y t+l -Y t+lu ,N t+l ,a t+2 ,N t+2 ,...,a n ,N n ]. Then, because [a t+J ,j > 
2) and {N t+ j, j > 1} are independent of the other conditioning variables in the defini¬ 
tion of u t and are also independent of Y t , we have u t = Y,\, + E[y,|y (+ | — Y t+i | f ] = 
y,|, + A,(y, +1 — y f+1 | t ), where A, is given by (5.5.12). Thus, because the conditioning 
variables in u t generate Z x ,..., Z n , it follows that 

Y t \„ = E[Y t \Z l ,...,Z n i 

= E[u t \Z\, ..., z n ] = Y t | ( + A,(y f+1 |„ — y t+1 | ( ) 

as in (5.5.11). The relation (5.5.13) for the error covariance matrix follows from rather 
straightforward calculations. This derivation of the fixed-interval smoothing relations is 
given by Ansley and Kohn (1982). 

Thus, it is seen from (5.5.11)—(5.5.13) that the optimal smoothed estimates Y t | n are 
obtained by first obtaining the filtered values Y t \ t through the forward recursion of the 
Kalman filter relations, followed by the backward recursions of (5.5.11)—(5.5.13) for t = 
n — 1,.... 1. This type of smoothing procedure has applications for estimation of trend and 
seasonal components (seasonal adjustment) in economic time series, as will be discussed 
in Section 9.4. When smoothed estimates Y t \ n are desired only at a fixed time point (or 
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only at a few fixed points), for example, in relation to problems that involve the estimation 
of isolated missing values in a time series, then an alternative “fixed-point” smoothing 
algorithm may be useful (e.g., see Anderson and Moore (1979) or Brockwell and Davis 
(1991)). 


Example. As a simple example of the state-space model and associated Kalman filtering 
and smoothing, consider a basic structural model in which an observed series Z t is viewed 
as the sum of unobserved trend and noise components. To be specific, assume that the 
observed process can be represented as 

Z, = p t + N, where p t = p,_\ + a, 

so that p t is a random walk process and N t is an independent (white) noise process. This 
is a simple example of a time-invariant state-space model with <I> = 1 and H = 1 in (5.5.4) 
and (5.5.5) and with the state vector Y t = p, representing an underlying (unobservable) 
“trend or level” process (or “permanent” component). For this model, application of 
the Kalman filter and associated smoothing algorithm can be viewed as the estimation of 
the underlying trend process p t based on the observed process Z t . The Kalman filtering 
relations (5.5.6)—(5.5.9) for this basic model reduce to 


ftt\t - + K t(Z t ~ p f—1|/—t) - K,Z t + (1 - K t )j. 

where the gain is K t = V t \ t _\[V t \t — 1 + er^.] -1 , with 

v t+l\t = v r\t-\ ~ v t\t-A v t\t-l + lv t\t-l + °a 

Then p t |, represents the current estimate of the trend component p, given the observations 
Z[,..., Z, through time t. The steady-state solution to the Kalman filter relations is obtained 
as t —> oo for V (V = lim,^^ V t+l \ t ), which satisfies V = V — V[V + o 2 N ]~ l V + a 2 , that 
is, V[V + (7 ^] _1 K = <7 2 , and the corresponding steady-state gain is K = V[V + o 2 N Y x . 
In addition, the recursion (5.5.11) for the smoothed estimate of the trend component p t 
becomes 


fit\„ = fit\t + A t(fit+i\n ~ fi,+i\t) 

= (1 - A t )p, |, + A t p t+l |„ t = n - 1 , ..., 1 

noting that p t+ i\, = P t \„ where A, = V At V~\ ]r = V t]t {V t]t + a 2 }- 1 and V t]t = (1 - 
K ( )K ( | ( _|, with the recursion for the calculation of being as given above. Thus, 

the smoothed value is a weighted average of the filtered estimate p t \ t at time t and the 
smoothed estimate p t+ \\ n at time t + 1. The steady-state form of this smoothing recursion is 
the same as above with a constant A = A t , which can be found to equal A = 1 — K. 

Hence, the steady-state (backward) smoothing relation (5.5.11) for this example has the 
same form as the steady-state filter relation already mentioned; that is, they both have the 
form of an exponential weighted moving average (EWMA) with the same weight. 
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5.6 SUMMARY 

The results of this chapter may be summarized as follows: Let z t be the deviation of an 
observed time series from any known deterministic function of time / (t). In particular, for 
a stationary series, /(f) could be equal to q, the mean of the series, or it could be equal to 
zero, so that z, was the observed series. Then, consider the general ARIMA model 

f(B)V d z, = 6{B)a, 


or 


cp{B)z t = 9(B)a t 

Minimum Mean Square Error Forecast. Given the knowledge of the series up to some 
origin f, the minimum mean square error forecast z t (l)(l > 0) of z r+1 is the conditional 
expectation 


!,(/) = [z t+l ] = E[z t+l \z t , z r _t,...] 

Lead 1 Forecast Errors. A necessary consequence is that the lead 1 forecast errors are the 
generating a t ’s in the model and are uncorrelated. 

Calculation of the Forecasts. It is usually simplest in practice to compute the forecasts 
directly from the difference equation to give 


Z 1 (0 - < f’|[2f+/-|] + •" + <Pp+A z t+l-p-d ] + [ fl t+/] - ^1 [°r+/-l] 

- 0 q [a t +l-q] (5.6.1) 

The conditional expectations in (5.6.1) are evaluated by inserting actual z’s when these are 
known, forecasted z’s for future values, actual c/’s when these are known, and zeros for 
future c/’s. The forecasting process may be initiated by approximating c/’s by zeros and, in 
practice, the appropriate form for the model and suitable estimates for the parameters are 
obtained by methods set out in Chapters 6-8. 

Probability Limits for Forecasts. The probability limits may be obtained as follows: 

1. By first calculating the i// weights from 

Vo = 1 
V\ =<P i~0i 

Vi = V\V\ +Vi ~02 (5.6.2) 

Vj = <P\Wj-\ + ■" + (Pp+dVj-p-d ~ 


where 0j = 0, j > q. 
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2. For each desired level of probability e, and for each lead time /, substituting in 

/ /-l X 1 / 2 

z f+/ (±) = !,(/) ± u e/2 ^1 + 2 ysjj <y a (5.6.3) 

where in practice o a is replaced by an estimate s a , of the standard deviation of the 
white noise process a,. and i s the deviate exceeded by a proportion e/2 of the 
unit normal distribution. 

Updating the Forecasts. When a new deviation z r+1 comes to hand, the forecasts may 
be updated to origin t + 1, by calculating the new forecast error a t+l = z f+1 — z r (l) and 
using the difference equation (5.6.1) with t + 1 replacing t. Flowever, an alternative method 
is to use the forecasts z f (l), z,( 2), ..., z(L) at origin t, to obtain the first L — 1 forecasts 
f r+1 (l), I r+ 1 (2),..., z t+] (L - 1) at origin t + 1, from 

z t+l (l) = z t (l + 1) + y/,a t+l (5.6.4) 

and then generate the last forecast z f+1 (L) using the difference equation (5.6.1). 

Other Ways of Expressing the Forecasts. The above is all that is needed for practical 
utilization of the forecasts. However, the following alternative forms provide theoretical 
insight into the nature of the forecasts generated by different models: 

1. Forecasts in Integrated Form. For 1 > q — p — d, the forecasts lie on the unique curve 

UD = b®MD + bffiil) + - + b^ A _J p+d _ i(/) (5.6.5) 

determined by the “pivotal” values z t (q), z,{q — 1),..., z,(q — p — d + 1), where 
z t (—j) = z,_j (J = 0,1,2,...). If q > p + d. the first q — p — d forecasts do not lie 
on this curve. In general, the stationary autoregressive operator contributes damped 
exponential and damped sine wave terms to (5.6.5), and the nonstationary operator 
contributes polynomial terms up to degree d — 1. 

The adaptive coefficients b (d) in (5.6.5) may be updated from origin t to t + 1 by 
amounts depending on the last lead 1 forecast error a t+l , according to the general 
formula 

b (,+1) = L'b (,) + g a t+l (5.6.6) 

given in Appendix A5.3. Specific examples of the updating are given in (5.4.5) and 
(5.4.13) for the IMA(0,1,1) and 1MA(0,2,2) processes, respectively. 

2. Forecasts as a Weighted Sum of Past Observations. It is instructive from a theoretical 
point of view to express the forecasts as a weighted sum of past observations. Thus, 
if the model is written in inverted form. 


a , = x(B)z, = (1 - n x B - ji 2 B 2 — ■■■)z t 


the lead 1 forecast is 


z r ( 1) — K\Z t + ti 2 z t _ x + 


(5.6.7) 
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and the forecasts for longer lead times may be obtained from 


!,(/) = n x [z t+l _i] + k 2 [z, +/ _ 2 ] + ••• (5.6.8) 

where the conditional expectations in (5.6.8) are evaluated by replacing z’s by actual 
values when known, and by forecasted values when unknown. 

Alternatively, the forecast for any lead time may be written as a linear function of 
the available observations. Thus, 


00 


!<(0 = 

7=1 


where the k' 1) are functions of the jt.-’s. 

J J 

Role of Constant Term in Forecasts. The forecasts will be impacted by the allowance 
of a nonzero constant term 0 O in the ARIMA( p, cl, q) model, cp{B)z t = 0 O + 0(B)a t , where 
cp(B) = tj)(B)V d . Then, in (5.3.3) and (5.6.5), an additional deterministic polynomial term 
of degree d , {p w /d\)l d with w t = V d z t and p w = E[w t \ = 0 O /(1 — 4>\ — 4>2~ - fp), 

will be present. This follows because in place of the relation cp(B)z t (l) = 0 Q in (5.3.2), 
the forecasts now satisfy cp(B)z t (l) = 0 O , 1 > q, and the deterministic polynomial term 
of degree d represents a particular solution to this nonhomogeneous difference equation. 
Hence, in the instance of a nonzero constant term 0 {) , the ARIMA model is also expressible 
as 4>{B)(y d z, — p w ) = 0(B)a t , p w / 0, and the forecast in the form (5.6.5) may be viewed 
as representing the forecast value of z t+l = z r+/ — f(t + /), where f(t + l) = (^ w /d\)(t + 
l) d + g(t + 1) and g(t) is any fixed deterministic polynomial in t of degree less than or 
equal to d — 1 (including the possibility g(t) = 0). For example, in an ARIMA model with 
d = 1 such as the ARIMA(1, 1, 1) model example of Section 5.4.6, but with 0 Q f 0, the 
eventual forecast function of the form z t (l ) = b^' 1 + b'^cj) 1 will now contain the additional 
deterministic linear trend term n w l, where ia w = 0q/{\ — </>), similar to the result in the 
example for the ARIMA(1,1,0) model in (5.4.17). Note that in the special case of a 
stationary process z t , with d = 0, the additional deterministic term in (5.3.3) reduces to the 
mean of the process z r , /,( = E[z,]. 


APPENDIX A5.1 CORRELATION BETWEEN FORECAST ERRORS 

A5.1.1 Autocorrelation Function of Forecast Errors at Different Origins 

Although it is true that for an optimal forecast the forecast errors for lead time 1 will be 
uncorrelated, this will not generally be true of forecasts at longer lead times. Consider 
forecasts for lead times /, made at origins t and t — j, respectively, where j is a positive 
integer. Then, if j = l, l + 1, / + 2,..., the forecast errors will contain no common compo¬ 
nent, but for /' = 1,2 ,... ,1 — 1, certain of the a 's will be included in both forecast errors. 
Specifically, 


e t (l) — z t+ i — z t (l ) — a t+/ + a r+ /_i + ••• + V^/—l^r+i 
e ,-j(l) = z,_ j+l - z t _j(l ) = a t _ j+l + W\a ,- j+ ,-1 + - + Wi-\a,_ j+ i 
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TABLE A5.1 Autocorrelations of Forecast Errors at Lead 6 for Series C 


j 

0 

1 

2 

3 

4 

5 

6 

6), e,_ m ] 

1.00 

0.81 

0.61 

0.41 

0.23 

0.08 

0.00 


and for j < /, the lag j autocovariance of the forecast errors for lead time / is 

/-l 

E[e,(l)e t _j(l)] = a] Yj WiVi-j 


(A5.1.1) 


t=J 


where y/ 0 = 1. The corresponding autocorrelations are 




v~i/— 1 

Li .j ViWi-j 

W—1 2 

2 /=0 vr 


o < j < l 

j>i 


(A5.1.2) 


We show in Chapter 7 that Series C of Figure 4.1 is well fitted by the (1,1,0) model 
(1 — 0.8 B)S7z t = a,. To illustrate (A5.1.2), we calculate the autocorrelation function of the 
forecast errors at lead time 6 for this model. It follows from Section 5.2.1 that the i// weights 
y/ l , y / 2 ,..., 1//5 for this model are 1.80, 2.44. 2.95, 3.36, and 3.69, respectively. Thus, for 
example, the lag 1 autocovariance is 

E[e,(6 )e,_j(6)] = <^[(1.80 x 1.00) + (2.44 x 1.80) + •■■ 4- (3.69 X 3.36)] 

= 35.70(t; 

On dividing by E[e~( 6 )] = 43 . 86 ( 7 ^, we obtain p[e t ( 6 ), e r _| ( 6 )] = 0.81. The first six au¬ 
tocorrelations are shown in Table A5.1 and plotted in Figure A5.1(a). As expected, the 
autocorrelations beyond the fifth are zero. 


A5.1.2 Correlation Between Forecast Errors at the Same Origin with Different 
Lead Times 

Suppose that we make a series of forecasts for different lead times from the same fixed 
origin t. Then, the errors for these forecasts will be correlated. We have for j = 1,2, 3,..., 


e t (I) — z r+ , - UD ~ a t+i + Vi a t+l -1 + ■" + x l / i-i a t+l 
e t0 + j) = z t+i+i - z t0 + j) = a t+!+j + Wia t +i+j-\ + — f V / J-I a r +/+1 

+ WjOt+i + Wj+\ a t+i-\ + - + ¥i +j -ia t+1 


so that the covariance between the t -origin forecast errors at lead times / and 1 + j is 

2 !=0 ViV'i+j’ where Vo = L 

Thus, the correlation coefficient between the f-origin forecast errors at lead times / and 
l+j is 


p[e t (l),e t (l+j)] 


/— 1 

L i= 0 ViVi+j 




1/2 


(A5.1.3) 
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0123456789 

(a) 



(b) 

FIGURE A5.1 Correlations between various forecast errors for Series C. (a) Autocorrelations of 
forecast errors for Series C from different origins at lead time / = 6 (b) Correlations between forecast 
errors for Series C from the same origin at lead time 3 and lead time j. 

To illustrate (A5.1.3), we compute, for forecasts made from the same origin, the cor¬ 
relation between the forecast error at lead time 3 and the forecast errors at lead times 
j = 1,2,3,4,..., 16 for Series C. For example, using (A5.1.3) and the i/r weights given in 
Section 5.2.2, 

2 

E[e,(3)e,(5)] = o 2 a ^ w +2 = + VW 3 + ViVd 

(=0 

= a; [(1.00 X 2.44) + (1.80 X 2.95) + (2.44 X 3.36)] 

= 15.94(7; 

The correlations for lead times j = 1,2,..., 16 are shown in Table A5.2 and plotted in 
Figure A5.1(b). As is to be expected, forecasts made from the same origin at different lead 
times are highly correlated. 


APPENDIX A5.2 FORECAST WEIGHTS FOR ANY LEAD TIME 

In this appendix we consider an alternative procedure for calculating the forecast weights 
7r <r> applied to previous z’s for any lead time 1. To derive this result, we make use of the 
identity (3.1.7), namely, 

(1 + + ■■■)(! — ttj B — — •••) = 1 

from which the n weights may be obtained in terms of the 1 // weights, and vice versa. 
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TABLE A5.2 Correlation Between Forecast Errors at Lead 3 
and at Lead j Made from a Fixed Origin for Series C 


j 

p[e,(3).e,U)] 

j 

p[e t (3),e t (j)] 

1 

0.76 

9 

0.71 

2 

0.94 

10 

0.67 

3 

1.00 

11 

0.63 

4 

0.96 

12 

0.60 

5 

0.91 

13 

0.57 

6 

0.85 

14 

0.54 

7 

0.80 

15 

0.52 

8 

0.75 

16 

0.50 


On equating coefficients, we find, for j > 1, 

j 

Wj = X K iVj-i (Vo = 0 

/=i 


Thus, for example, 


¥i = 71 1 

Wi=n\¥\ +n 2 

¥3 = x\¥i + n i¥\ + ^3 


K\ = yq 

j[ 2 =y/ 2 - y/ l x 1 

n 2 = ¥3~ ¥1^2 ~ ¥2*1 


(A5.2.1) 


Now from (5.3.6), 

z,(l) = jr l z t (/ - 1) + n 2 z,(l - 2) + ••• + n l _ l z t (l) + x t z t + n l+l z t _ x + ••• (A5.2.2) 


Since each of the forecasts in (A5.2.1) is itself a function of the observations 
z ,, z t _ 1 , z t _ 2 ,..., we can write 


z,U) = + k ^ 2 z t -\ + xfz t - 2 + •" 

where the lead / forecast weights may be calculated from the lead 1 forecast weights 
x 1 . = Jij. We now show that the weights n l] can be obtained using the identity 

l 

xf = X Vl-i n i+i -1 = *j+l- 1 + ¥\n j+ i -2 + - + ¥i-\Kj 
1=1 

For example, the weights for the forecast at lead time 3 are 

= *3 + ^1*2 + ¥2*1 
4 3) = ^4 + ^1^3 + ¥2*2 

(3) 

= x 5 + y/ 1 x 4 + y/ 2 x 3 


(A5.2.3) 
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and so on. To derive (A5.2.3), we write 

z,(l) = via, + + ••• 

z,+l-iV) = V\a t +i-\ + •" + ¥i a t + Vi+\ a t-\ + "■ 
On subtraction, we obtain 


Z,(D = Z (+ ,_i( 1) - l^lfl r +/-l - V2 a t+l-2 -^/-l«r+l 


Hence, 


UD =K \ z t+!-\ + K 2 z t+!-2 + -b K l-\ z t+\ + K l z t + 7r l+\ z l-\ + •" 

+ Wi(~ z t +i-i + 77 \ z t+l—2 + "• + 7r l-2 z t +1 + ^z-t z ( + K l z t -1 + •”) 

+ W2^~ z t+l-2 + 71 \ z t+l-3 + •" + 7c I-3 z I+\ + K l-2 z t + K l-\ z t-\ + •") 

+ •" 

+ M / l-\(~ z t+\ + K \ z t + n 2 z t-l + '") 

Using the relation (A5.2.1), each one of the coefficients of z t+ j_ lt ..., z (+1 is seen to 
vanish, as they should, and on collecting terms, we obtain the required result (A5.2.3). 
Alternatively, we may use the formula in the recursive form 

= ^+1° + Vi-iXj (A5.2.4) 

Using the model V 2 z, = (1 — 0.9 B + 0.5 B 2 )a t for illustration, we calculate the weights for 
lead time 2. Equation (A5.2.4) gives 

(2) 

K j = xj +1 + V\Xj 

and using the weights in Table 5.2, with i//| = 1.1 we have, for example, 

/r[ 2) =n 2 + \i/ l n { = 0.490+ (1.1)(1.1)= 1.700 
nf =7r 3 + Wl K 2 = -0.109 + (1.1)(0.49) = 0.430 
and so on. The first 12 weights have been given in Table 5.2. 


APPENDIX A5.3 FORECASTING IN TERMS OF THE GENERAL 
INTEGRATED FORM 

A5.3.1 General Method of Obtaining the Integrated Form 

We emphasize once more that for practical computation of the forecasts, the difference 
equation procedure is by far the simplest. The following general treatment of the integrated 
form is given only to elaborate further on the forecasts obtained. In this treatment, rather 
than solving explicitly for the forecast function as we did in the examples given in Section 
5.4, it will be appropriate to write down the general form of the eventual forecast function 
involving p + d adaptive coefficients. We then show how the eventual forecast function 
needs to be modified to deal with the first q — p — d forecasts if q > p + d. Finally, we 
show how to update the adaptive coefficients from origin t to origin t + 1. 



FORECASTING IN TERMS OF THE GENERAL INTEGRATED FORM 169 

If it is understood that z t (—j) = z t _j for j = 0,1,2,then using the conditional ex¬ 
pectation argument of Section 5.1.1, the forecasts satisfy the difference equation: 

z r (l) - (zqz, (0)- <Pp+dZt( 1 ~ P~ d) = -0 x a t - e q a t-q+i 

z,(2) - cp\Z,(\) - (Pp+d^ti 2 — p — d) = -0 2 a t - 0 q a,_ q+2 

\ (A5.3.1) 

z,(q) - cp\Z, t (q - 1) - Vp+d^M ~ P ~ d) = -0 q a, 

z,(l) - cp x z,(l - 1)- <p p+d z t (l-p-d) = 0 / > q 

The eventual forecast function is the solution of the last equation and may be written as 

p+d—1 

m = b^m+b i ; ) m+--- + bf +d _ l f p+d _ l (i)= £ bf/sD 

i =0 

/ > q — p — d (A5.3.2) 

When q is less than or equal to p + d, the eventual forecast function will provide forecasts 
z r (l), z r (2), z t ( 3),... for all lead times / > 1. 

As an example of such a model with q < p + d, suppose that 

(1 - B)(l - VTB + B 2 ) 2 z, = (1 - 0.5 B)a, 

so that p + d = 5 and q = 1. Then, 

(1 - 5)(1 - VIb + B 2 ) 2 z,(1) = 0 1 = 2, 3,4,... 

where B now operates on / and not on t. Solution of this difference equation yields the 
forecast function 

t<l) = If" + if cos (•^ ) + if / cos ( =£■) 

+ ifsi„(f) + if/s,„(f) / = 1,2,... 

If q is greater than p + d, then for lead times 1 < q — p — d, the forecast function will have 
additional terms containing a t _i’ s. Thus, 

p+d-l j 

z,(l)= 2 tfpfSJ) + Yj d u a t-i l<q-p-d (A5.3.3) 

i=0 i=0 

where j = q — p — d — I and the d's may be obtained explicitly by substituting (A5.3.3) in 
(A5.3.1). For example, consider the stochastic model 

V 2 z t = (1 - 0.8 B + 0.5 B 2 - 0 .4B 3 + 0.1 B 4 )a, 

in which p + d = 2, q = 4, q — p — d = 2 and cp j = 2, cp 2 = — 1, 6 X = 0.8, = —0.5, d 2 = 

0.4, and 0 4 = —0.1. Using the recurrence relation (5.2.3), we obtain i/q = 1.2, i/g = 1.9, 
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1//3 = 2.2, and y/ 4 = 2.6. Now, from (A5.3.3), 

z t (\) = b^ ] + b^ ] + d w a, + d n a,-\ 

2 ,(2) = bf + 2 bf + d 20 a, 
m = bf + bfi 1 > 2 

Using (A5.3.1) gives 


z f (4)-2z,(3) + z t (2) = 0.1«, 


so that from (A5.3.4) 


d 20 a , = 0.1 a, 

and hence d 2 0 = 0.1. Similarly, from (A5.3.1), 

z r (3) - 2z r (2) + z t (l) = -0.4 a, + 0.1a f _j 

and hence using (A5.3.4), 

—0.2a ( + d\Q@ t + = —0.4gf^ + 0.1u r _j 


yielding 


d l0 = - 0.2 d n = 0.1 


Hence, the forecast function is 

z,(l) = bf + bf - 0 . 2 a, + 0.1 a t _ x 
z t (2) = b® + 2 bf + 0.1 a, 
z t (l ) = b® + bfl 1 > 2 


(A5.3.4) 


A5.3.2 Updating the General Integrated Form 

Updating formulas for the coefficients may be obtained using the identity (5.2.5) with t + 1 
replaced by t: 


z,(l) = z t _ x (l + 1) + Via, 

Then, for 1 > q — p — d, 

p+d -1 p+d -1 

2 *>?m= X b^Ml + D + wa, (A5.3.5) 

i=0 ;= 0 

By solving p + d such equations for different values of/, we obtain the required updating 
formula for the individual coefficients, in the form 

p+d -1 

bf = 2 L ij b ( j '~ 1) + gl a t 
j =0 
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Note that the updating of each of the coefficients of the forecast function depends only on 
the lead 1 forecast error a t = z, — z,_i(l). 

A5.3.3 Comparison with the Discounted Least-Squares Method 

Although to work with the integrated form is an unnecessarily complicated way of comput¬ 
ing forecasts, it allows us to compare the present mean square error forecast with another 
type of forecast that has received considerable attention. Let us write 


/o(0 

MD 

•• fp+dMD 

fod + 1 ) 

Ml +1) 

•• f P+d -iO+1 ) 

foil + P + d — 1) 

Ml+P + d- 1) • 

•• f p+ dMl + P + d - 1) 



re 1 






b« = 

e 

V 1 = 

¥i+ 1 


'm. 


Wl+p+d -1 _ 


Then, using (A5.3.5) for 1,1 + 1,... ,1 + p + d -i —l,we obtain for 1 > q — p — d, 

F;b (0 = F ;+1 b ,_1 + Wl a, 


yielding 

b« = (F- 1 F ;+1 )b (, - 1 > + (F7V,)a ( 


or 


b (I) = L'b (r_1) + g a, (A5.3.6) 

Equation (A5.3.6) is of the same algebraic form as the updating function given by the 
“discounted least-squares” procedure of Brown (1962) and Brown and Meyer (1961). 
For comparison, if we denote the forecast error given by that method by e r then Brown’s 
updating formula may be written as 

p {,) = \J p {t ~ V} + he, (A5.3.7) 

where /1 (,) is his vector of adaptive coefficients. The same matrix L appears in (A5.3.6) and 
(A5.3.7). This is inevitable, for this first factor merely allows for changes in the coefficients 
arising from translation to the new origin and would have to occur in any such formula. 
For example, consider the straight line forecast function: 

Zt-i(l) — b Q +b l I 

where b^~ 11 is the ordinate at time t — 1, the origin of the forecast. This can equally well 
be written as 

Vt(o = (C 1} +O + b T V){l - 
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where now (b ( ( ' 1 ’ + 1> ) is the ordinate at time t. Obviously, if we update the forecast to 

origin t, the coefficient b Q must be suitably adjusted even if the forecast function were to 
remain unchanged. 

In general, the matrix L does not change the forecast function, it merely relocates it. 
The actual updating is done by the vector of coefficients g and h. We will see that the 
coefficients g, which yield the minimum mean square error forecasts, and the coefficients 
h given by Brown are in general completely different. 

Brown’s Method of Forecasting. 

1. A forecast function is selected from the general class of linear combinations and 
products of polynomials, exponentials, and sines and cosines. 

2. The selected forecast function is fitted to past values by a ‘ ‘discounted least-squares’ ’ 
procedure. In this procedure, the coefficients are estimated and updated so that the 
sum of squares of weighted discrepancies 

00 

S W = X m A z t-j - Zt(-j)] 2 (A5.3.8) 

7=0 

between past values of the series and the value given by the forecast function at the 
corresponding past time are minimized. The weight function Wj is chosen arbitrarily 
to fall off geometrically, so that coj = (1 — a)f where the constant a , usually called 
the smoothing constant , is (again arbitrarily) set equal to a value in the range 0.1-0.3. 

Difference between the Minimum Mean Square Error Forecasts and those of Brown. 
To illustrate these comments, consider the forecasting of IBM stock prices, discussed by 
Brown (1962, p. 141). In this study, he used a quadratic model that would be, in the present 
notation, 

m = pf + p ? 1 + \$t 2 

With this model, he employed his method of discounted least squares to forecast stock 
prices 3 days ahead. The results obtained from this method are shown for a section of the 
IBM series in Figure A5.2, where they are compared with the minimum mean square error 
forecasts. 

The discounted least-squares method can be criticized on the following grounds: 

1. The nature of the forecast function ought to be decided by the autoregressive operator 
cp(B) in the stochastic model, and not arbitrarily. In particular, it cannot be safely 
chosen by visual inspection of the time series itself. For example, consider the IBM 
stock prices plotted in Figure A5.2. It will be seen that a quadratic function might 
well be used to fit short pieces of this series to values already available. If such fitting 
were relevant to forecasting, we might conclude, as did Brown, that a polynomial 
forecast function of degree 2 was indicated. The most general linear process for 
which a quadratic function would produce minimum mean square error forecasts at 
every lead time / = 1,2,... is defined by the (0,3, 3) model 

V 3 z, = (1 - B X B - 0 2 B 2 - 9 i B 3 )a t 
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FIGURE A5.2 IBM stock price series with comparison of lead 3 forecasts obtained from best 
IMA(0.1,1) model and Brown’s quadratic forecast for a period beginning from July 11, 1960. 

which, arguing as in Section 4.3.3, can be written as 

V 3 z, = V 3 u, + Aq -f- + A 2 ^j^\ 

However, we show in Chapter 7 that if this model is correctly fitted, the least- 
squares estimates of the parameters are A 1 = A 2 = 0 and A l} ~ 1.0. Thus, Vz, = 
(1 — 6B)a t , with 0 = 1 — A Q close to zero, is the appropriate stochastic model, and 
the appropriate forecasting polynomial is z,(/) = which is of degree 0 in 1 and 
not of degree 2. 

2. The choice of the weight function in (A5.3.8) must correspondingly be decided 
by the stochastic model, and not arbitrarily. The use of the discounted least-squares 
fitting procedure would produce minimum mean square error forecasts in the very 
restricted case, where 

a. the process was of order (0,1,1), so Vz, = (1 — 6B)a t , 

b. a polynomial of degree 0 was fitted, and 

c. the smoothing constant a was set equal to our A = \ — 6. 

In the present example, even if the correct polynomial model of degree 0 had been 
chosen, the value a = A = 0.1, actually used by Brown, would have been quite 
inappropriate. The correct value A for this series is close to unity. 

3. The exponentially discounted weighted least-squares procedure forces all the p + d 
coefficients in the updating vector h to be functions of the single smoothing parameter 
a. In fact, they should be functions of the p + q independent parameters ($, 0). 


Thus, the differences between the two methods are not trivial, and it is interesting to 
compare their performances on the IBM data. The minimum mean square error forecast is 
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TABLE A5.3 Comparison of Mean Square Error of Forecasts Obtained at Various Lead 
Times Using Best IMA(0,1,1) Model and Brown’s Quadratic Forecasts 







Lead Time / 






1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

MSE (Brown) 

102 

158 

218 

256 

363 

452 

554 

669 

799 

944 

MSE (A = 0.9) 

42 

91 

136 

180 

282 

266 

317 

371 

427 

483 


z,(l) = b Q (t), with updating 1 1 + Aa t , where A ~ 1.0. If A is taken to be exactly 

equal to unity, this is equivalent to using 

z,(l) = z, 

which implies that the best forecast of the stock price for all future time is the present 
price. 1 The suggestion that stock prices behave in this way is, of course, not new and goes 
back to Bachelier (1900). Since z, = Sa t when A = 1, this implies that z t is a random walk. 

To compare the minimum mean square error forecast with Brown’s quadratic forecasts, 
a direct comparison was made using the IBM stock price series from July 11, 1960 to 
February 10, 1961, for 150 observations. For this stretch of the series, the minimum MSE 
forecast is obtained using the model Vz f = a t — 0a t _ j, with 0 = 0.1, or A = 1 — 6 = 0.9. 
Figure A5.2 shows the minimum MSE forecasts for lead time 3 and the corresponding 
values of Brown’s quadratic forecasts. It is seen that the minimum MSE forecasts, which 
are virtually equivalent to using today’s price to predict that 3 days ahead, are considerably 
better than those obtained using Brown’s more complicated procedure. 

The mean square errors for the forecast at various lead times, computed by direct 
comparison of the value of the series and their lead / forecasts, are shown in Table A5.3 
for the two types of forecasts. It is seen that Brown’s quadratic forecasts have mean square 
errors that are much larger than those obtained by the minimum mean square error method. 


EXERCISES 

5.1. For the models 

(1) z t - 0.5z r _[ = a t 

(2) Vz, = a t — 0.5a,_j 

(3) (1 -0.6B)Vz, = a, 

write down the forecasts for lead times / = 1 and 1=2: 

(a) From the difference equation 

(b) In integrated form (using the i// • weights) 

(c) As a weighted average of previous observations 


1 This result is approximately true supposing that no relevant information except past values of the series itself is 
available and that fairly short forecasting periods are being considered. For longer periods, growth and inflationary 
factors would become important. 
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5.2. The following observations represent values z 9 |, Z 92 ,..., z^qo from a series fitted by 
the model Vz, = a, — 1.1 a,_i + Q.28a t _ 2 : 


166,172,172,169,164,168,171,167,168,172 


(a) Generate the forecasts z 100 (0 for / = 1,2,..., 12 and draw a graph of the series 
values and the forecasts (assume a 90 = 0 , a 91 = 0 ). 

(b) With = 1.103, calculate the estimated standard deviations a(l) of the forecast 
errors and use them to calculate 80% probability limits for the forecasts. Insert 
these probability limits on the graph, on either side of the forecasts. 

5.3. Suppose that the data of Exercise 5.2 represent monthly sales. 

(a) Calculate the minimum mean square error forecasts for quarterly sales for 1, 2, 
3, 4 quarters ahead, using the data up to 1 = 100. 

(b) Calculate 80% probability limits for these forecasts. 

5.4. Using the data and forecasts of Exercise 5.2, and given the further observation 

-tot = 174: 

(a) Calculate the forecasts z 10 i(/) for / = 1,2,..., 11 using the updating formula 

W0 = ~t0 + 1) + M / i a t +1 

(b) Verify these forecasts using the difference equation directly. 

5.5. For the model Vz, = a, — l.lo r _i + 0.28a r _ 2 of Exercise 5.2: 

(a) Write down expressions for the forecast errors e r (l), e r (2),..., e r ( 6 ), from the 
same origin t. 

(b) Calculate and plot the autocorrelations of the series of forecast errors e r (3). 

(c) Calculate and plot the correlations between the forecast errors e,(2) and e,(j) for 
y = 1 , 2 ,... , 6 . 

5.6. Let the vector e' = (e l , e 2 ,..., e L ) have for its elements the forecast errors made 

1,2 , ,L steps ahead, all from the same origin t. Then if a' = ( a l+l , a t+2 ,..., a t+L ) 

are the corresponding uncorrelated random shocks, show that 


e = Ma where M 


! 0 0-0 

i// l 1 0 0 

\j/ 2 \j/ j 1 ••• 0 

V L -\ Vl-2 Vl -3 - 1 


Also, show that (e.g.. Box and Tiao, 1976; Tiao et al., 1975) H ( ,, the covariance matrix 
of the e’s, is H ( . = and hence that a test to determine if a set of subsequently 

realized values z r+1 , z r+ -,,..., z t+L of the series taken jointly differ significantly from 
the forecasts made at the origin t is obtained by referring 




-1, 


e'fMM'r'e 


a 



a 


2 

*+j 
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to a chi-square distribution with L degrees of freedom. Note that a t+ j is the one-step- 
ahead forecast error calculated from z t+ j — z t+ j_i( 1). 

5.7. Suppose that a quarterly economic time series is well represented by the model 

Vz, = 0.5 + (1 - 1.05 + 0.5 B 2 )a, 

with t7“ = 0.04. 

a 

(a) Given z 48 = 130, a 41 = —0.3,a 4g = 0.2, calculate and plot the forecasts z 48 (/) 
for/= 1,2,..., 12. 

(b) Calculate and insert the 80% probability limits on the graph. 

(c) Express the series and forecasts in integrated form. 

5.8. Consider the annual Wolfer sunspot numbers for the period 1770-1869 listed as 
Series E in Part Five of this text. The same series is available for the longer period 
1700-1988 as "sunspot.year" in the datasets package of R. You can use either 
data set. Suppose that the series can be represented by an autoregressive model of 
order 3. 

(a) Plot the time series and comment. Does the series look stationary? 

(b) Generate forecasts and associated probability limits for up to 20 time periods 
ahead for the series. 

(c) Perform a square root transformation of the data and repeat (a) and (b) above, 

(d) Use the function BoxCox.arQ in the TSA package of R to show that the square 
root transformation is appropriate for this series; see help(BoxCox.ar) for details. 
{Note: Adding a small amount, for example, 1/2, to the series, eliminates zero 
values and allows the program to consider a log transformation as an option). 

5.9. A time series representing a global mean land-ocean temperature index from 1880 
to 2009 is available in a file called “gtemp” in the astsa package of R. The data are 
temperature deviations, measured in degree centigrades, from the 1951-1980 average 
temperature, as described by Shumway and Stoffer (2011, p. 5). Assume that a third- 
order autoregressive model is appropriate for the first differences w t = (1 — B)z t of 
this series. 

(a) Plot the time series z t and the differenced series w t using R. 

(b) Generate forecasts and associated probability limits for up to 20 time periods 
ahead for this series using the function sarima.for() without including a constant 
term in the model. 

(c) Generate the same forecasts and probability limits as in part (b) but with a 
constant term now added to the model. Discuss your findings and comment on 
the implications of including a constant in this case. 

5.10. For the model (1 — 0.6fi)(l — B)z t = (1 + 0.3 B)a t , express explicitly in the state- 
space form of (5.5.2) and (5.5.3), and write out precisely the recursive relations of 
the Kalman filter for this model. Indicate how the (exact) forecasts z t+ ^ t and their 
forecast error variances v t+ ^ t are determined from these recursions. 



PART TWO 


STOCHASTIC MODEL BUILDING 


We have seen that an ARIMA model of order ( p , d, q) provides a class of models capable 
of representing time series that, although not necessarily stationary, are homogeneous and 
in statistical equilibrium in many respects. 

The ARIMA model is defined by the equation 

<KB)(l-B) d z t = 0 o + 0(B)a t 

where 4>(B) and 9(B) are operators in B of degree p and q. respectively, whose zeros lie 
outside the unit circle. We have noted that the model is very general, including as spe¬ 
cial cases autoregressive models, moving average models, mixed autoregressive-moving 
average models, and the integrated forms of all three. 

Iterative Approach to Model Building. The development of a model of this kind to describe 
the dependence structure in an observed time series is usually best achieved by a three-stage 
iterative procedure based on identification, estimation, and diagnostic checking. 

1. By identification we mean the use of the data, and of any information on how the 
series was generated, to suggest a subclass of parsimonious models worthy to be 
entertained. 

2. By estimation we mean efficient use of the data to make inferences about the param¬ 
eters conditional on the adequacy of the model entertained. 

3. By diagnostic checking we mean checking the fitted model in its relation to the data 
with intent to reveal model inadequacies and so to achieve model improvement. 
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In Chapter 6, which follows, we discuss model identification, in Chapter 7 estimation of 
parameters, and in Chapter 8 diagnostic checking of the fitted model. In Chapter 9 we 
expand on the class of models developed in Chapters 3 and 4 to the seasonal ARIMA 
models, and all the model building techniques of the previous chapters are illustrated 
by applying them to modeling and forecasting seasonal time series. In Chapter 10 we 
consider some additional topics that represent extensions beyond the linear ARIMA class 
of models such as conditional heteroscedastic time series models, nonlinear time series 
models, and fractionally integrated long memory processes, which allow for certain more 
general features in the time series than are possible in the linear ARIMA models. Unit root 
testing is also discussed in this chapter. 



6 


MODEL IDENTIFICATION 


In this chapter, we discuss methods for identifying nonseasonal autoregressive integrated 
moving average (ARIMA) time series models. Identification methods are rough procedures 
applied to a set of data to indicate the kind of model that is worthy of further investigation. 
The specific aim here is to obtain some idea of the values of p, d, and q needed in the 
general linear ARIMA model and to obtain initial estimates for the parameters. The tentative 
model specified provides a starting point for the application of the more formal and efficient 
estimation methods described in Chapter 7. The examples used to demonstrate the model¬ 
building process will include Series A-F that have been discussed in earlier chapters and 
are listed in the Collection of Time Series in Part Five of this book. The series are also 
available electronically at http://pages.stat.wisc.edu/ reinsel/bjr-data/. 


6.1 OBJECTIVES OF IDENTIFICATION 

It should first be said that identification and estimation necessarily overlap. Thus, we may 
estimate the parameters of a model, which is more elaborate than the one we expect to 
use, so as to decide at what point simplification is possible. Here we employ the estimation 
procedure to carry out part of the identification. It should also be explained that identification 
is necessarily inexact. It is inexact because the question of what types of models occur in 
practice and in what specific cases depends on the behavior of the physical world and 
therefore cannot be decided by purely mathematical argument. Furthermore, because at 
the identification stage no precise formulation of the problem is available, statistically 
“inefficient” methods must necessarily be used. It is a stage at which graphical methods 
are particularly useful and judgment must be exercised. However, it should be kept in mind 
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that the preliminary identification commits us to nothing except tentative consideration of 
a class of models that will later be efficiently fitted and checked. 

6.1.1 Stages in the Identification Procedure 

Our task, then, is to identify an appropriate subclass of models from the general ARIMA 
family 

f(B)W d z, = e 0 + B(B)a, (6.1.1) 

which may be used to represent a given time series. Our approach will be as follows: 

1. To assess the stationarity of the process z t and, if necessary, to difference z t as many 
times as is needed to produce stationarity, hopefully reducing the process under study 
to the mixed autoregressive-moving average process: 

4>{B)w, = 6 q + 6(B)a, 

where 

w, = (1 -B) d z, = V d z, 

2. To identify the resulting autoregressive-moving average (ARMA) model for w t . 

Our principal tools for putting steps 1 and 2 into effect will be the sample autocorrelation 
function and the sample partial autocorrelation function. They are used not only to help 
guess the form of the model but also to obtain approximate estimates of the parameters. 
Such approximations are often useful at the estimation stage to provide starting values for 
iterative procedures employed at that stage. Some additional model identification tools may 
also be employed and are discussed in Section 6.2.4. 


6.2 IDENTIFICATION TECHNIQUES 

6.2.1 Use of the Autocorrelation and Partial Autocorrelation Functions in 
Identification 

Identifying the Degree of Differencing. We have seen in Section 3.4.2 that for a stationary 
mixed autoregressive-moving average process of order ( p , 0, q), 4>(B)z l = 9{B)a t , the 
autocorrelation function satisfies the difference equation 

f(B)p k = 0, k > q 

Also, if 4>(B) = (1 — G t B), the solution of this difference equation for the kth auto¬ 

correlation is, assuming distinct roots, of the form 

p k = A X G\ + A 2 G k 2 + - + A p G k k > q — p (6.2.1) 

The stationarity requirement that the zeros of <p(B) lie outside the unit circle implies that 
the roots G X ,G 2 ,..., G p lie inside the unit circle. 
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This expression shows that in the case of a stationary model in which none of the roots 
lie close to the boundary of the unit circle, the autocorrelation function will quickly “die 
out” for moderate and large k. Suppose now that a single real root, say Gq, approaches 
unity, so that 


G l = \-S 

where 8 is some small positive quantity. Then, since for k large 


Pk - A \0 ~ kS ) 


the autocorrelation function will not die out quickly and will fall off slowly and very nearly 
linearly. A similar argument may be applied if more than one of the roots approaches 
unity. 

Therefore, a tendency for the autocorrelation function not to die out quickly is taken as 
an indication that a root close to unity may exist. The estimated autocorrelation function 
tends to follow the behavior of the theoretical autocorrelation function. Therefore, failure 
of the estimated autocorrelation function to die out rapidly might logically suggest that 
we should treat the underlying stochastic process as nonstationary in z t , but possibly as 
stationary in Vz t , or in some higher difference. 

However, even though failure of the estimated autocorrelation function to die out rapidly 
suggests nonstationarity, the estimated autocorrelations need not be extremely high even 
at low lags. This is illustrated in Appendix A6.1, where the expected behavior of the 
estimated autocorrelation function is considered for the nonstationary (0,1,1) process 
Vz r = (1 —6B)a t . The ratio E[c k \/E[ cq\ of expected values falls off only slowly, but 
depends initially on the value of 6 and on the number of observations in the series, and 
need not be close to unity if 0 is close to 1. We illustrate this point again in Section 6.3.4 
for Series A. 

For the reasons given, it is assumed that the degree of differencing d, necessary to 
achieve stationarity, has been reached when the autocorrelation function of w, = \ ,d z. t dies 
out fairly quickly. In practice, d is normally 0, 1, or 2, and it is usually sufficient to inspect 
the first 20 or so estimated autocorrelations of the original series, and of its first and second 
differences, if necessary. 

Overdifferencing. Once stationarity is achieved, further differencing should be avoided. 
Overdifferencing introduces extra serial correlation and increases model complexity. To 
illustrate this point, assume that the series z t follows a random walk so that the differenced 
series w,=( 1 — B)z t = a t is white noise and thus stationary. Further differencing of w, leads 
to (1 — B)w, = (1 — B)a t , which is a MA(1) model for w, with parameter 0 = 1. Thus, the 
resulting model for z t would be an ARIMA(0,2,1) model instead of the simpler ARIMA(0, 
1, 0) model. The model with 0 = 1 is noninvertible and the pure autoregressive representa¬ 
tion does not exit. Noninvertibility also causes problems at the parameter estimation stage 
in that approximate maximum likelihood methods tends to produce biased estimates in 
this case. 

Figure 6.1 shows the autocorrelation and partial autocorrelation functions of a time series 
of length 200 generated from a random walk model with innovations variance equal to 1. The 
first 1000 observations were discarded to eliminate potential start-up effects. The estimated 
autocorrelations up to lag 20 of the original series and its first and second differences are 
shown in the graph. The autocorrelations of the original series fail to damp out quickly, 
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FIGURE 6.1 Estimated autocorrelation and partial autocorrelation functions for a simulated ran¬ 
dom walk process and its first (d = 1) and second (d = 2) differences. 


indicating a need for differencing. The autocorrelations of w t = Vz r , on the other hand, 
are all small, demonstrating that stationarity has now been achieved. The autocorrelation 
function of the second differences w t = V 2 z r also indicates stationarity, but it has a spike 
at lag 1 showing the extra correlation that has emerged because of overdifferencing. The 
value of /•] is close to —0.5, which is consistent with the lag 1 autocorrelation coefficient 
for an MA(1) model with 6=1. Figure 6.1 can be reproduced in R as follows: 

> RW=arima.sim(list(order=c(0,1,0)),n=200,n.start=1000) 

> acf0=acf(RW,20) 

> pact0=pacf(RW,20) 

> acfl=acf(diff(RW),20) 

> pacfl=pacf(diff(RW),20) 

> acf2=acf(diff(diff(RW)),20) 

> pacf2=pacf(diff(diff(RW)),20) 

> par(mfrow=c(3,2)) 

> plot(acf0,main='d=0') 

> plot(pacf0,main='d=0') 

> plot(acf1,ylim=c(- 0.5,0.5),main='d=l') 

> plot(pacf1,ylim=c(- 0.5,0.5) ,main='d=l') 
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> plot(acf2,ylim=c(- 0.5,0.5) ,main='d=2') 

> plot(pact2,ylim=c(- 0.5,0.5),main='d=2 ' ) 


Identifying a Stationary ARMA Model for the Differenced Series. Having tentatively 
decided on the degree of differencing d , we examine the patterns of the estimated autocor¬ 
relation and partial autocorrelation functions of the differenced series, w, = (1 — B) d z t , to 
determine a suitable choice for the orders p and q of the autoregressive and moving average 
operators. Here we recall the characteristic behavior of the theoretical autocorrelation and 
partial autocorrelation functions for moving average, autoregressive, and mixed processes, 
discussed in Chapter 3. 

Briefly, whereas the autocorrelation function of an autoregressive process of order 
p tails off, its partial autocorrelation function has a cutoff after lag p. Conversely, the 
autocorrelation function of a moving average process of order q has a cutoff after lag q, 
while its partial autocorrelation function tails off. If both the autocorrelations and partial 
autocorrelations tail off, a mixed process is suggested. Furthermore, the autocorrelation 
function for a mixed process, containing a pth-order autoregressive component and a q\h- 
order moving average component, is a mixture of exponentials and damped sine waves 
after the first q—p lags. Conversely, the partial autocorrelation function for a mixed process 
is dominated by a mixture of exponentials and damped sine waves after the first p—q lags 
(see Table 3.2). 

In general, autoregressive (moving average) behavior, as measured by the autocorrela¬ 
tion function, tends to mimic moving average (autoregressive) behavior as measured by the 
partial autocorrelation function. For example, the autocorrelation function of a first-order 
autoregressive process decays exponentially, while the partial autocorrelation function cuts 
off after the first lag. Correspondingly, for a first-order moving average process, the au¬ 
tocorrelation function cuts off after the first lag. Although not precisely exponential, the 
partial autocorrelation function is dominated by exponential terms and has the general 
appearance of an exponential. 

Of particular importance are the autoregressive and moving average processes of first 
and second order and the simple mixed (1 ,d, 1) process. The properties of the theoretical 
autocorrelation and partial autocorrelation functions for these processes are summarized 
in Table 6.1, which requires careful study and provides a convenient reference table. The 
reader should also refer to Figures 3.2, 3.7, and 3.10, which show typical behavior of 
the autocorrelation function and the partial autocorrelation function for the second-order 
autoregressive process, the second-order moving average process, and the simple mixed 
ARMA(1,1) process. 

6.2.2 Standard Errors for Estimated Autocorrelations and Partial Autocorrelations 

Estimated autocorrelations can have rather large variances and can be highly autocorrelated 
with each other. For this reason, detailed adherence to the theoretical autocorrelation func¬ 
tion cannot be expected in the estimated function. In particular, moderately large estimated 
autocorrelations can occur after the theoretical autocorrelation function has damped out, 
and apparent ripples and trends can occur in the estimated function that have no basis in the 
theoretical function. In employing the estimated autocorrelation function as a tool for iden¬ 
tification, it is usually possible to be fairly sure about broad characteristics, but more subtle 
indications may or may not represent real effects. For these reasons, two or more related 



TABLE 6.1 Behavior of the Autocorrelation Functions for the </th Difference of an ARIMA Process of Order (p, d, q) a 





Order 




(l.d.O) 

(0. d , 1) 

(2, d , 0) 

(0, d, 2) 

(1. d, 1) 

Behavior of p k 

Decays 

Only p x 

Mixture of 

Only p x and 

Decays 


exponentially 

nonzero 

exponentials 
or damped 
sine wave 

p 7 nonzero 

exponentially 
from first lag 

Behavior of 4> kk 

Only <0,, 

Exponential 

Only <0, | and 

Dominated by 

Dominated by 

Preliminary 

estimates from 

nonzero 

dominates 

decay 

-0, 

<022 nonzero 

± p\(,\-p 2 ) 

mixture of 
exponential or 
damped sine wave 

-e x (\-e 2 ) 

exponential decay 
from first lag 

(1 - 0,^X0,-*,) 

<01 = P\ 



l+S; 

. _ Pj-fij 
l-p] 

Pl \+e]+Q 

-e 2 

p1 \+e\ + e\ 

f ' !+9j-2#,9, 

Pi = <t>\P\ 

Admissible region 

-1 < <0, < 1 

-1 <0, < 1 

-1 < <0 2 < 1 
<02 + <01 < 1 
<02 — <01 < 1 

-\<o 2 <\ 
e 2 + e x <\ 
e 2 -e x <\ 

-1 < <0, < 1 
-1 < 0, < 1 


a Table A and Charts B-D are included at the end of this book to facilitate the calculation of approximate estimates of the parameters for first-order moving average, second-order 
autoregressive, second-order moving average, and the mixed ARMA(1, 1) processes. 
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models may need to be entertained and investigated further at the estimation and diagnostic 
checking stages of model building. 

In practice, it is important to have some indication of how far an estimated value may 
differ from the corresponding theoretical value. In particular, we need some means for 
judging whether the autocorrelations and partial autocorrelations are effectively zero after 
some specific lag q or p , respectively. For larger lags, on the hypothesis that the process is 
moving average of order q, we can compute standard errors of estimated autocorrelations 
from the simplified form of Bartlett’s formula (2.1.15), with sample estimates replacing 
theoretical autocorrelations. Thus, 

&[r k \^ - 1 ^ 1 + 2 ( 7-2 + r 2 + _ +r 2 )] 1/2 k>q ( 6 . 2 . 2 ) 

For the partial autocorrelations, we use the result quoted in (3.2.36) that, on the hypoth¬ 
esis that the process is autoregressive of order p, the standard error for estimated partial 
autocorrelations of order p + 1 and higher is 

*[&*] - k > P (6.2.3) 

It was shown by Anderson (1942) that for moderate n, the distribution of an estimated 
autocorrelation coefficient, whose theoretical value is zero, is approximately normal. Thus, 
on the hypothesis that the theoretical autocorrelation p k is zero, the estimate r k divided 
by its standard error will be approximately distributed as a unit normal variate. A similar 
result is true for the partial autocorrelations. These facts provide an informal guide as to 
whether theoretical autocorrelations and partial autocorrelations beyond a particular lag are 
essentially zero. 

6.2.3 Identification of Models for Some Actual Time Series 

Series A-D. In this section, the model specification tools described above are applied to 
some of the actual time series that we encountered in earlier chapters. We first discuss 
potential models for Series A to D plotted in Figure 4.1. As remarked in Chapter 4 on 
nonstationarity, we expect Series A, C, and D to possess nonstationary characteristics since 
they represent the ‘ ‘uncontrolled’ ’ behavior of certain process outputs. Similarly, we would 
expect the IBM stock price Series B to have no fixed level and to be nonstationary. 

The estimated autocorrelations of z t and the first differences Vz r for Series A-D are 
shown in Figure 6.2. Figure 6.3 shows the corresponding estimated partial autocorrelations. 
The two figures were generated in R using commands similar to those used to produce 
Figure 6.1. For the chemical process concentration readings in Series A, the autocorrelations 
for V z t are small after the first lag. This suggests that this time series might be described by 
an IMA(0,1,1) model. However, from the autocorrelation function of z t , it is seen that after 
lag 1 the correlations do decrease fairly regularly. Therefore, an alternative is that the series 
follows a mixed ARMA(1,0,1) model. The partial autocorrelation function of z t seems to 
support this possibility. We will see later that the two alternatives result in virtually the 
same model. For the stock price Series B, the results confirm the nonstationarity of the 
original series and suggest that a random walk model (1 — B)z, = a t is appropriate for this 
series. 

The estimated autocorrelations of the temperature Series C also indicate nonstationarity. 
The roughly exponential decay in the autocorrelations for the first difference suggests a 
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FIGURE 6.2 Estimated autocorrelation functions of the original series (d = 0), and their first 
differences (d = 1) for Series A-D. 


process of order (1,1,0), with an autoregressive parameter <p around 0.8. Alternatively, 
we notice that the autocorrelations of Vz r decay at a relatively slow rate, suggesting 
that further differencing might be needed. The autocorrelation and partial autocorrelation 
functions of the second differences \ /2 z, (not shown) were rather small, suggesting a white 
noise process for the second differences. This implies that an IMA(0,2,0) model might 
also be appropriate for this series. Thus, the possibilities are 

(1 -0.8B)(1 - B)z, = a, 

(1 - B) 2 z, = a, 

The second model is very similar to the first, differing only in the choice of 0.8 rather than 
1.0 for the autoregressive coefficient. 

Finally, the autocorrelation and partial autocorrelation functions for the viscosity 
Series D suggest that an AR(1) model (1 — tpB)z t = a t with <p around 0.8 might be ap¬ 
propriate for this series. Alternatively, since the autocorrelation coefficients decay at a 
relatively slow rate, we will also consider the model (1 - B)z t = a t for this series. 

Series E and F. Series E shown in the top graph of Figure 6.4 represents the annual Wolfer 
sunspot numbers over the period 1770-1869. This series is likely to be stationary since 
the number of sunspots is expected to remain in equilibrium over long periods of time. 
The autocorrelation and partial autocorrelation functions in Figure 6.4 show characteristics 
similar to those of an AR(2) process. However, as will be seen later, a marginally better 
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FIGURE 6.3 Estimated partial autocorrelation functions of the original series (d = 0), and their 
first differences (d = 1) for Series A-D. 


fit is obtained using an AR(3) model. The fit can be improved further using a square root 
or log-transformation of the series. An autoregressive model of order nine is suggested by 
the order selection routine ar() in the R package that selects AR order based on the Akaike 
information criterion (AIC) to be discussed in Section 6.2.4. Other options considered in the 
literature include nonlinear time series models, such as bilinear or threshold autoregressive 
models, discussed briefly in Section 10.3. 

Series F introduced in Chapter 2 represents the yields of a batch chemical process. 
The series is expected to be stationary since the batches are processed under uniformly 
controlled conditions. The stationarity is confirmed by Figure 6.5 that shows a graph of 
the series along with the autocorrelation and partial autocorrelation functions of the series 
and its first differences. The results for the undifferenced series suggest that a first-order 
autoregressive model might be appropriate for this series. 

A summary of the models tentatively identified for Series A to F is given in Table 6.2. 
Note that for Series C and F, the alternative models suggested above have been made 
slightly more general for further illustrations later on. 

Notes on the identification procedure. The graphs of the autocorrelation and partial au¬ 
tocorrelation functions shown above were generated using R. In assessing the estimated 
correlation functions, it is very helpful to plot one or two standard error limits around zero 
for the estimated coefficients. Limits from the R package are included in the graphs dis¬ 
played above. These limits are approximate two standard error limits, ±2/determined 
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Series E: Sunspot numbers 1770-1869 
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FIGURE 6.4 Estimated autocorrelation and partial autocorrelation functions of the sunspot series 
(Series E) and its first differences. 


under the assumption that all the theoretical autocorrelation coefficients are zero so that 
the series is white noise. If a hypothesis about a specific model is postulated, alternative 
limits could be determined from Bartlett’s formula as discussed above. When the calcula¬ 
tions are performed in R, inclusion of the argument ci.type=‘ma’ in the acf() function 


TABLE 6.2 Tentative Identification of Models for Series A-F 


Series 

Degree of 
Differencing 

Apparent Nature of Differenced Series 

Identification 
for z t 

A 

Either 0 

Mixed first-order AR with first-order MA 

(1,0,1) 


or 1 

First-order MA 

(0, 1 , 1) 

B 

1 

First-order MA 

(0,1,1) 

C 

Either 1 

First-order AR 

(1,1,0) 


or 2 

Uncorrelated noise 

(0,2,2)“ 

D 

Either 0 

First-order AR 

(1,0,0) 


or 1 

Uncorrelated noise 

(0, 1 , 1)“ 

E 

Either 0 

Second-order AR 

(2,0,0) 


or 0 

Third-order AR 

(3,0,0) 

F 

0 

Second-order AR 

(2,0,0) 


a The order of the moving average operator appears to be zero, but the more general form is retained for 
subsequent consideration. 
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FIGURE 6.5 Estimated autocorrelation and partial autocorrelation functions for the yield of a batch 
chemical process (Series F) and its first differences. 


yields confidence bounds computed based on the assumption that the true model is 
MA(/c - 1). 

Three other points concerning this identification procedure need to be mentioned: 

1. Simple differencing of the kind we have used will not produce stationarity in series 
containing seasonal components. In Chapter 9, we discuss the appropriate modifica¬ 
tions for such seasonal time series. 

2. As discussed in Chapter 4, a nonzero value for 6 0 in (6.1. 1 ) implies the existence of 
a systematic polynomial trend of degree d. For the nonstationary models in Table 
6.2, a value of d Q = 0 can perfectly well account for the behavior of the series. 
Occasionally, however, there will be some real physical phenomenon requiring the 
provision of such a component. In other cases, it might be uncertain whether or not 
such a provision should be made. Some indication of the evidence supplied by the 
data, for the inclusion of 6 0 in the model, can be obtained at the identification stage 
by comparing the mean Tv of w t = V rf z, with its approximate standard error, using 
o 1 2 3 (w) = n -1 <y^[l + 2 pi(w) + 2 p 2 (w) + 

3. It was noted in Section 3.4.2 that, for any ARMA(/>, q) process with p — q > 0, the 
whole positive half of the autocorrelation function will be a mixture of damped sine 
waves and exponentials. This does not, of course, prevent us from tentatively identi¬ 
fying q, because (a) the partial autocorrelation function will show p — q‘ ‘anomalous’ ’ 
values before behaving like that of an MA(g) process, and (b) q must be such that the 
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autocorrelation function could take, as starting values following the general pattern, 
Pq back tO 


6.2.4 Some Additional Model Identification Tools 

Although the sample autocorrelation and partial autocorrelation functions are extremely 
useful in model identification, there are sometimes cases involving mixed models where 
they can provide ambiguous results. This may not be a serious problem since, as has been 
emphasized, model specification is always tentative and subject to further examination, 
diagnostic checking, and modification, if necessary. Nevertheless, there has been consider¬ 
able interest in developing additional tools for use at the model identification stage. These 
include the R and S array approach proposed by Gray et al. (1978), the generalized partial 
autocorrelation function studied by Woodward and Gray (1981), the inverse autocorrela¬ 
tion function considered by Cleveland (1972) and Chatfield (1979), the extended sample 
autocorrelation function of Tsay and Tiao (1984), and the use of canonical correlation anal¬ 
ysis as examined by Akaike (1976), Cooper and Wood (1982), and Tsay and Tiao (1985). 
Model selection criteria such as the AIC criterion introduced by Akaike (1974a) and the 
Bayesian Information Criterion (BIC) of Schwarz (1978) are also useful supplementary 
tools. 

Canonical Correlation Methods. For illustration, we briefly discuss the use of canonical 
correlation analysis for model identification. In general, for two sets of random variables, 
Yj = (th,Ti 2 . Yik)' an d Y 2 = (t 2 i’T 22 ’ ••• Y 21 )' > °f dimensions k and / (assume k < 
/), canonical correlation analysis involves determining linear combinations (7,- = a'Y l 
and Vj = b'Y 2 , / = \k, and corresponding correlations p(i) = corr[[/,-, V/j with p( 1 ) > 
p(2) > ■■■ > p(k) > 0. The linear combinations are chosen so that the U t and Vj are mutually 
uncorrelated for /' / j, (7) and Fj have the maximum possible correlation p( 1) among all 
linear combinations of Y , and Y 2 ,U 2 and V 2 have the maximum possible correlation p(2) 
among all linear combinations of Y t and Y 2 that are uncorrelated with U 3 and Fj, and so 
on. The resulting correlations p(i) are called the canonical correlations between Y j and 
Y 0 , and the variables (/,■ and V t are the corresponding canonical variates. If Q = cov[Y] 
denotes the covariance matrix of Y = (Y', Y' 2 Y, with Q (J = cov[Y,-, Y y ], then it is known 
that the values p 2 (i) are the ordered eigenvalues of the matrix an d the 

vectors a h such that (7, = a'Y j, are the corresponding (normalized) eigenvectors; that is, 
the p 2 (i) and a t satisfy 

[p 2 (i)l - =0 i=l,...,k (6.2.4) 

with p 2 ( 1) > p 2 { 2) > ••• > p 2 (k) > 0 (e.g., Anderson (1984), p. 490). Similarly, one can 
define the notion of partial canonical correlations between Y, and Y 2 , given another set 
of variables Y 3 , as the canonical correlations between Yj and Y 2 after they have been 
‘‘adjusted’ ’ for the effects of Y 3 by linear regression on Y 3 , analogous to the definition of 
partial correlations as discussed in Section 3.2.5. A useful property to note is that if there 
exist (at least) s < k linearly independent linear combinations of Y, that are completely 
uncorrelated with Y 2 , say U = A'Y[ such that cov[Y 2 ,17] = Q 21 A = 0, then there are (at 
least) s zero canonical correlations between Yj and Y 2 . This follows easily from (6.2.4) 
since there will be (at least) s linearly independent eigenvectors satisfying (6.2.4) with 



IDENTIFICATION TECHNIQUES 191 


corresponding p(i) = 0. In effect, then, the number s of zero canonical correlations is equal 
to s = k — r, where r = rank(Q 2 i). 

In the ARMA time series model context, following the approach of Tsay and Tiao (1985), 
we consider Y mt = (z t , z r _ 1? ..., z t _ m )' and examine the canonical correlation structure 
between the variables Y m t and 

Y 1 = (.Zf-j-U z t-j-2' ’ Zt—j—l—m) 

for various combinations of m = 0,1,... and j = 0,1,... A key feature to recall is that the 
autocovariance function y k of an ARMA(p, q) process z, satisfies (3.4.2), and, in particular, 

p 

Yk~Yj ti*k-t = 0 k > q 

i=i 

Thus, for example, if m > p, there is (at least) one linear combination of Y m t , 

p 

" E M-i 0)Y mJ = a’Y m , (6.2.5) 

(=i 

such that 

t 

a ' Y m ,t = a t ~ E d i a '~i 
i=i 


which is uncorrelated with Y m t -j-\ for j > q. In particular, then, for m = p and j = q, there 
is one zero canonical correlation between Y pl and Y pt _ q _\, as well as between Y pt and 
Y p t -j-\ , J > q, and between Y m t and Y m m > p, while in general it is not difficult to 
establish that there are s = min (m + 1 — p, j + 1 — q) zero canonical correlations between 
Y m t and Y m r •_j for m > p and j > q. Hence, one can see that determination of the 
structure of the zero canonical correlations between Y mt and Y mt _j_ l for various values 
of m and j will serve to characterize the orders p and q of the ARMA model, and so the 
canonical correlations will be useful in model identification. We note the special cases of 
these canonical correlations are as follows. First, when m = 0, we are simply examining the 
autocorrelations pj +l between z t and z t _j _ lf which will all equal zero in an MAA/) process 
for j > q. Second, when j = 0, we are examining the partial autocorrelations <p m+ [ m+1 
between z, and z t _ m _ j, given z t _\,... ,z t _ m , and these will all equal zero in an AR(p) 
process for m > p. Hence, the canonical correlation analysis can be viewed as an extension 
of the analysis of the autocorrelation and partial autocorrelation functions of the process. 

In practice, based on (6.2.4), one is led to consider the sample canonical correlations 
p(i), which are determined from the eigenvalues of the matrix: 


(E, Y »A)" (ZWmJ-J- 1) 

X (Er^-l^-l-l)" 1 {Ijmq-J-Xnq) 


( 6 . 2 . 6 ) 


for various values of lag j = 0,1,... and m = 0,1,.... Tsay and Tiao (1985) use a chi- 
squared test statistic approach based on the smallest eigenvalue (squared sample canonical 
correlation) of (6.2.6). They propose the statistic c(m,j) = —(n — m —/)ln[l — 
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Mm,j)/d(m,j)], where d(m,j) =1+2 rf(w'), j > 0,r,(w/) denotes the sample au¬ 
tocorrelation at lag i of w[ = z t - $'( ) z J _, - ••• - and the are estimates of 

the </> ( .’s obtained from the eigenvector (see, for example, equation (6.2.5)) corresponding 
to X(m,j). The statistic c(m, j) has an asymptotic x\ distribution when m = p and j > q 
or when m > p and j = q and can be used to test whether there exists a zero canonical 
correlation in theory. Hence if the sample statistics exhibit a pattern such that they are all 
insignificant, relative to a x\ distribution, for m > p and j > q for some p and q values, then 
the model might reasonably be identified as an ARMA(p, q) for the smallest values (p, q ) 
such that this pattern holds. Tsay and Tiao (1985) also show that this procedure is valid for 
nonstationary ARIMA models cp(B)z, = 9(B)a t , in the sense that the overall order p + d 
of the generalized AR operator cp(B) can be determined by the procedure, without initially 
deciding on differencing of the original series z t . 

Canonical correlation methods were previously also proposed for ARMA modeling by 
Akaike (1976) and Cooper and Wood (1982). Their approach is to perform a canon¬ 
ical correlation analysis between the vector of present and past values, P, = Y m t = 
(z t , z r _i,..., z,_ m Y, and the vector of future values, F t+l = (z (+] , z t+2 , ■■■)’■ In practice, 
the finite lag m used to construct the vector of present and past values P, may be fixed by 
use of an order determination criterion such as Akaike information criteria to be discussed 
a little later in this section, applied to fitting of AR models of various orders. The canonical 
correlation analysis is performed sequentially by adding elements to F , +1 one at a time, 
starting with F* = (z r+1 ), until the first zero canonical correlation between P t and the 
F t+ \ is determined. Akaike (1976) uses an AlC-type criterion called deviance information 
criterion (DIC) to judge whether the smallest sample canonical correlation can be taken 
to be zero, while Cooper and Wood (1982) use a traditional chi-squared statistic approach 
to assess the significance of the smallest canonical correlation, although as pointed out by 
Tsay (1989a), to be valid in the presence of a moving average component, this statistic 
needs to be modified. 

At a given stage in the procedure, when the smallest sample canonical correlation 
between P t and F* is first judged to be 0 and z t+K+l is the most recent variable to be 
included in F* +1 , a linear combination of z t+K+l in terms of the remaining elements of 
F* is identified that is uncorrelated with the past. Specifically, the linear combination 

z 1+K+ 1 — < t , j^t+K+\-j °f ^e elements in the vector F * +1 of future values is (in theory) 

determined to be uncorrelated with the past P t . Hence, this canonical correlation analysis 
procedure determines that the forecasts z,(K + 1) of the process satisfy 

K 

z t (K + 1) - ^ (pjZ,(K + 1 ~j) = 0 o 
j= i 

By reference to the relation (5.3.2) in Section 5.3, for a stationary process, this implies that 
an ARMA model is identified for the process, with K = max {/;, q]. 

As can be seen, in the notation of Tsay and Tiao (1985), the methods of Akaike and 
Cooper and Wood represent canonical correlation analysis between Y mt and Y n _ lt+Il 
for various n = 1,2,.... Since the Tsay and Tiao method considers canonical correlation 
analysis between Y mt and Y mt _j_ l for various combinations of m = 0 , 1 ,... and j = 
0 , 1 ,.... it is more general and, in principle, it is capable of providing information on the 
orders p and q of the AR and MA parts of the model separately, rather than just the maximum 
of these two values. In practice, when using the methods of Akaike and Cooper and Wood, 
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the more detailed information on the individual orders p and q would be determined at the 
stage of maximum likelihood estimation of the parameters of the ARM At k', K) model. 

Use of Model Selection Criteria. Another approach to model selection involves the use 
of information criteria such as AIC proposed by Akaike (1974a) or the Bayesian infor¬ 
mation criteria of Schwarz (1978). In the implementation of this approach, a range of 
potential ARMA models are estimated by maximum likelihood methods to be discussed 
in Chapter 7, and for each model, a criterion such as AIC (normalized by sample size n), 
given by 


AIC 


p.q 


—2 In (maximized likelihood) + 2r , 2 

-« ln(er ) + r —I- constant 

n a n 


or the related BIC given by 


BIC 


p.q 


ln(^) + r 


ln(w) 

n 


is evaluated. Here, <7“ is the maximum likelihood estimate of a 1 , and r = p + q + 1 is the 
number of estimated parameters, including a constant term. In the above criteria, the first 
term essentially corresponds to —2/n times the log of the maximized likelihood, while the 
second term is a 1 ‘penalty factor’ ’ for inclusion of additional parameters in the model. In 
the information criteria approach, models that yield a minimum value for the criterion are 
to be preferred, and the AIC or BIC values are compared among various models as the 
basis for selection of the model. Hence, since the BIC criterion imposes a greater penalty 
for the number of estimated model parameters than does AIC, use of minimum BIC for 
model selection would always result in a chosen model whose number of parameters is no 
greater than that chosen under AIC. 

Hannan and Rissanen (1982) proposed a two-step model selection procedure that avoids 
the need to maximize the likelihood function for multiple combinations of p and q. At the 
first step, one fits an AR model of sufficiently high order m* to the series z t . The residuals 
a t from the fitted AR (m*) model provide estimates of the innovations a t in the ARMA( p, q) 
model. At the second step, one regresses z t on z r _ t ,..., z,_ p and a t _ j,..., a,_ q , for various 
combinations of p and q. That is, one fits approximate models of the form 


p q 

~ X 9 j S '-J + a < (6.2.1) 

7=1 7=1 

using ordinary least squares, and the estimated error variance, unconnected for degrees of 
freedom, is denoted by & pq . Then, using the BIC criterion, the order <p. q) of the ARMA 
model is chosen as the one that minimizes ln(<r^ ) + (p + q) In (n)/n. Hannnan and Rissanen 
show that, under very general conditions, the estimators of p and q chosen in this manner 
tend almost surely to the true values. The appeal of this procedure is that computation of 
maximum likelihood estimates over a wide range of possible ARMA models is avoided. 

While these order selection procedures are useful, they should be viewed as supple¬ 
mentary tools to assist in the model selection process. In particular, they should not be 
used as a substitute for careful examination of the estimated autocorrelation and partial 
autocorrelation functions of the series, and critical examination of the residuals a t from 
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a fitted model should always be included as a major part of the overall model selection 
process. 


6.3 INITIAL ESTIMATES FOR THE PARAMETERS 

6.3.1 Uniqueness of Estimates Obtained from the Autocovariance Function 

While a given ARMA model has a unique autocovariance structure, the converse is not true 
without additional conventions imposed for uniqueness, as we discuss subsequently. At 
first sight this would seem to rule out the use of the estimated autocovariances as a means of 
identification. However, we show in Section 6.4 that the estimated autocovariance function 
may indeed be used for this purpose. The reason is that, although there exists a multiplicity 
of ARMA models possessing the same autocovariance function, there exists only one that 
expresses the current value of w, = Y rf z r , exclusively in terms of previous history and in 
stationary invertible form. 


6.3.2 Initial Estimates for Moving Average Processes 

As shown in Chapter 3, the first q autocorrelations of a MA(g) process are nonzero and can 
be written in terms of the parameters of the model as 


—Qir + 9\0k+\ 4 " 


Pk = 


7 2 u k+2 + "• + Qq-k^q 


\ + e\ + e\ + - + 02 


k = 1,2 ,... ,q 


(6.3.1) 


The expression (6.3.1) for /q, p 2 ,..., p q , in terms of 0j, 9 2 ,..., 9 q , supplies q equations in 
q unknowns. Preliminary estimates of the 9 ’s can be obtained by substituting the estimates 
r k for p k in (6.3.1) and solving the resulting nonlinear equations. A preliminary estimate 
of Gq may then be obtained from 


To = <^(1 + 0\ + - + 

by substituting the preliminary estimates of the 9’s and replacing y 0 = a ^ by its estimate 
c 0 . The numerical values of the estimated autocorrelation coefficients r k for the series Z 
are conveniently obtained from R as follows: 


> ac=acf(z) 

> ac 


Preliminary Estimates fora (0, d, 1) Process. Table A in Part Five relates /q to O l , and by 
substituting r | ( w) for /q can be used to provide initial estimates for any (0, d, 1) process 
w t = (1 — 6\B)a t , where w t = V d z f . 

Preliminary Estimates for a (0, d, 2) Process. Chart C in Part Five relates /q and p 2 to 
9 j and 9 2 , and by substituting r , ( w) and r 2 (w) for p , and p 2 can be used to provide initial 
estimates for any (0, d, 2) process. 

In obtaining preliminary estimates in this way, the following points should be kept in 
mind: 
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1. The autocovariances are second moments of the joint distribution of the w’s. Thus, 
the parameter estimates are obtained by equating sample moments to their theoretical 
values. It is well known that the method of moments is not necessarily efficient and 
can produce poor estimates for models that include moving average terms. However, 
the rough estimates obtained can be useful in obtaining fully efficient estimates, 
because they supply an approximate idea of ‘ ‘where in the parameter space to look’ ’ 
for the most efficient estimates. 

2. In general, the equation (6.3.1), obtained by equating moments, will have multiple 
solutions. For instance, when q = 1, 


P\ = 


-Ox 

\ + e] 


(6.3.2) 


and hence from 6^ + (1/p l )6 l + 1 = 0, we see that both 


and 


2 P 


■ + 


1 

a Px ) 2 



Ox 


i 

2px 


1 

(2 Pxf 



(6.3.3) 


are possible solutions. For illustration, the first lag autocorrelation of the first dif¬ 
ference of Series A is about —0.4. Substitution in (6.3.3) yields the pair of solutions 
~ 0.5 and 0' ~ 2.0. However, the chosen value ~ 0.5 is the only value that 
lies within the invertibility interval —1 < 0 l < 1. In fact, it is shown in Section 6.4.1 
that it is always true that only one of the multiple solutions of (6.3.1) can satisfy the 
invertibility condition. 


Examples. Series A, B, and D were all identified in Table 6.2 as possible IMA processes 
oforder(0, 1, 1). We have seen in Section 4.3.1 that this model may be written in following 
the alternative forms: 


Vz, = (1 - 6 x B)a t 

Vz, = 4 0 a,_i + Va, (A 0 = 1 - 9 X ) 

OO 

j =1 

Using Table A in Part Five, the approximate estimates of the parameters shown in Table 6.3 
were obtained. 

Series C has been tentatively specified in Table 6.2 as an IMA(0,2,2) process: 

V 2 z, = (1 — 0 l B — 6 2 B 2 )a t 


or equivalently, 

V 2 z, = U 0 V + Aj )a r _j + V 2 o r 
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TABLE 6.3 Initial Estimates of Parameters for Series A, B, and D 


Series 



1 

II 

o 

A 

-0.41 

0.5 

0.5 

B 

0.09 

-0.1 

1.1 

D 

-0.05 

0.1 

0.9 


Since the first two sample autocorrelations of V 2 z, are very close to zero. Chart C in Part 
Five gives 6 l = 0, 0 2 = 0, so that 1 0 = 1 + 0 2 = 1 and Ij = 1 — § l — 0 2 = 1. On this basis, 
the series would be represented by 

V 2 Z, = a, (6.3.4) 

This would mean that the second difference, V 2 z r , was very nearly a random (white noise) 
series. 


6.3.3 Initial Estimates for Autoregressive Processes 

For an assumed AR process of order 1 or 2, initial estimates for </q and <p 2 can be calculated 
by substituting estimates rj for the theoretical autocorrelations p 2 in the formulas of Table 
6 .1, which are obtained from the Yule-Walker equations (3.2.6). In particular, for an AR( 1), 
<p u = r j, and for an AR(2), 


021 


022 


'hO ~r 2 ) 


l-r] 


r 2 ' 


!-r 2 


(6.3.5) 


where <p p j denotes the estimated yth autoregressive parameter in a process of order p. The 
corresponding formulas given by the Yule-Walker equations for higher order schemes may 
be obtained by substituting the rj for the pj in (3.2.7). Thus, 

0 = R; 1 ^ (6.3.6) 

where R p is an estimate of the p X p matrix P p , as depicted following (3.2.6) in 3.2.2, of 
autocorrelations up to order p — 1, and r = (/q, r 2 ,..., r p )' . For example, if p = 3, (6.3.6.) 
becomes 


031 


1 r \ r 2 

-1 

r \ 

0 32 

= 

C 1 r l 


r 2 

033 


r 2 r \ 1 


''3 


A simple recursive method due to Levinson and Durbin for obtaining the estimates for an 
AR(p) from those of an AR(p — 1) was discussed in Appendix A3.2. 

It will be shown in Chapter 7 that in contrast to the situation for MA processes, the 
autoregressive parameters obtained from (6.3.6) approximate the fully efficient maximum 
likelihood estimates. 
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Example. Series E representing the sunspot data behaves in its undifferenced form like 1 
an autoregressive process of second order: 

(1 — <p l B — <fi 0 B 2 )z t = a t 


Substituting the estimates = 0.81 and r 2 = 0.43, obtained using R, into (6.3.5), we have 
0, = 1.32 and 0 O = —0.63. 

As a second example, consider again Series C identified as either of order (1,1,0) or 
possibly (0,2,2). The first possibility would give 


(1 - </>i B)S7Z t = a t 

with 0i = 0.81, since r i for Vz f is 0.81. 

This example is interesting because it makes clear that the two alternative models that 
have been identified for this series are closely related. On the supposition that the series is 
of order (0,2,2), we found in (6.3.4) that this simplifies to 

(1 - B)( 1 - B)z, = a, (6.3.8) 

The alternative 

(1 -0.815X1 -B)z t = a t (6.3.9) 


is very similar. 


6.3.4 Initial Estimates for Mixed Autoregressive-Moving Average Processes 

It is often found, either initially or after suitable differencing, that w t = V d z, is most 
economically represented by a mixed ARMA process: 

(j){B)w, = 9(B)a t 

As noted in Section 6.2.1, a mixed process is indicated if both the autocorrelation and partial 
autocorrelation functions tail off rather than either having a cutoff feature. Another helpful 
fact in identifying the mixed process is that after lag q — p, the theoretical autocorrelations 
of the mixed process behave like the autocorrelations of a pure autoregressive process 
<p(B)w t = a t (see (3.4.3)). In particular, if the autocorrelation function of the c/th difference 
appears to be falling off exponentially from an aberrant first value r x , we would suspect 
that we have a process of order (1 ,d, 1) that is, 


(1 - (j) { B)w t = (1 - 9 l B)a t 


(6.3.10) 


where w t = V d z t . 


1 The sunspot series has been the subject of much investigation. Early references include Schuster (1906), Yule 
(1927), and Moran (1954). The series does not appear to be adequately represented by a second-order autoregres¬ 
sive process. A model related to the underlying mechanism at work would, of course, be the most satisfactory. 
More recent work has suggested empirically that a second-order autoregressive model would provide a better fit 
if a suitable transformation such as log or square root were first applied to z. Inclusion of a higher order term, at 
lag 9, in the AR model also improves the fit. Other possibilities include the use of nonlinear time series models, 
such as bilinear or threshold autoregressive models (e.g., see Section 10.3), as has been investigated by Subba 
Rao and Gabr (1984), Tong and Lim (1980), and Tong (1983,1990). 
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Approximate values for the parameters of the process (6.3.10) are obtained by substi¬ 
tuting the estimates r^w) and r 2 (w) for p\ and p 2 in the expression (3.4.8). This gives 

r (i-M)A-^i) 
ri \ + e\- 24 ) X e x 

r 2 = z-,0! 

Chart D in Part Five relates p x and p 2 to (j>\ and 9 l can be used to provide initial estimates 
of the parameters for any (1, d, 1) process. 

For example, using Figure 6.2, Series A was identified as of order (0,1,1), with 9 X 
about 0.5. Looking at the autocorrelation function of z t rather than that of w t = V z t , we see 
from /•[ onward the autocorrelations decay roughly exponentially, although slowly. Thus, 
an alternative specification for Series A is that it is generated by a stationary process of 
order (1,0,1). The estimated autocorrelations and the corresponding initial estimates of the 
parameters are then 

r { = 0.57 r 2 = 0.50 0, ~ 0.87 9 l ~ 0.48 

This identification yields the approximate model of order (1,0,1): 

(1 - 0.9 B)z t = (1 - 0.5 B)a t 

whereas the previously identified model of order (0,1,1), given in Table 6.5, is 

(1 - B)z t = (1 -0.5 B)a, 

Again we see that the “alternative” models are nearly the same. 

Compensation between Autoregressive and Moving Average Operators. The alternative 
models identified above are even more alike than they appear. This is because small 
changes in the autoregressive operator of a mixed model can be nearly compensated by 
corresponding changes in the moving average operator. In particular, if we have a model 

[1 - (1 - 8)B]z, = (1 - 9B)a, 

where 8 is small and positive, we can write 

(1 - B)z t = [1 - (1 - 5)£] _1 (1 - B)( 1 - 9B)a, 

= {1 -8B[\ +(1 -<5)B + (1 -8) 2 B 2 + -]}(1 -9B)a, 

= [1 — (9 + 8)B\a, + terms in a t _ 2 , a t _ 3 , , of order 8 


6.3.5 Initial Estimate of Error Variance 

For comparison with the more efficient methods of estimation to be described in Chapter 7, 
it is interesting to see how much additional information about the model can be extracted at 
the identification stage. We have already shown how to obtain initial estimates (0, 0) of the 
parameters (0, 0) in the ARMA model, identified for an appropriate difference w t = S7 d z t 
of the series. In this section we show how to obtain preliminary estimates of the error 
variance <rj, and in Section 6.3.6 we show how to obtain an approximate standard error for 
the sample mean Tv of the appropriately differenced series. 
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An initial estimate of the error variance may be obtained by substituting an estimate c () 
in the expression for the variance y (l given in Chapter 3. Thus, substituting in (3.2.8), an 
initial estimate of a 2 for an AR process may be obtained from 


= c o(! - 4>i r \ - 4>2 r 2 - 4>p r p) (6.3.11) 

Similarly, from (3.3.3), an initial estimate for a MA process may be obtained from 


<7? = 


c 0 


i + e\ + — + o 2 


(6.3.12) 


The form of the estimate for a mixed process is, in general, more complicated. However, 
for the important ARMA(1,1) process, it takes the form (see (3.4.7)) 


( l ,2 i 


1 + 9 2 — 2 </> 1 0 1 


c o 


(6.3.13) 


For example, consider the (1, 0, 1) model identified for Series A. Using (6.3.13) with 
< f>i = 0.87, 0] = 0.48, and c 0 = 0.1586, we obtain the estimate a 2 = 0.098. 


6.3.6 Approximate Standard Error for w 

The general ARIMA model, for which the mean n w of w t = \ ,d z. r is not necessarily zero, 
may be written in any one of the three forms: 


4>{B)(w t - n w ) = 9(B)a, 

(6.3.14) 

<j)(B)w t = 9 q + 9(B)a t 

(6.3.15) 

6(B)w, = 0(B)(a, + f) 

(6.3.16) 


where 

g 0 a-9,-02 - v £ 

Mw 1 — 01—02 - 4>P 1 — 01—02 - ( l } p 

Hence, if 1 — — <p 2 — — <p p / 0 and 1 — — 0 2 — ••• — 6 p # 0, n w = 0 implies that 

0q = O and | = 0. Now, in general, when d = 0, /./, will not be zero. However, consider 
the eventual forecast function associated with the general model (6.3.14) when d > 0. 
With n w = 0, this forecast function already contains an adaptive polynomial component of 
degree d — 1. The effect of allowing to be nonzero is to introduce a fixed polynomial 
term into this function of degree d. For example, if d = 2 and [i w is nonzero, the forecast 
function z t (l) includes a quadratic component in /, in which the coefficient of the quadratic 
term is fixed and does not adapt to the series. Because models of this kind are often 
inapplicable when d > 0, the hypothesis that /, t w = 0 will frequently not be contradicted by 
the data. Indeed, as we have indicated, we usually assume that = 0 unless evidence to 
the contrary presents itself. 

At this, the identification stage of model building, an indication of whether or not a 
nonzero value for /a w is needed may be obtained by comparison ofTv= X"=i w t /n with 
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its approximate standard error (see Section 2.1.5). With n = N — d differences available, 

00 00 

a 2 {w) = iTVq Yj Pj = 2 Yj 

— 00 —00 

that is, 

o 2 (w) = n _ V(l) (6.3.17) 

where y(B) is the autocovariance generating function defined in (3.1.10) and y( 1) is its 
value when B = B~ x = 1 is substituted. 

For illustration, consider the process of order (1, d, 0): 


(1 — (f>B)(w t - n w ) = a, 
with w, = V d z t . From (3.1.11), we obtain 


so 


r(B) = 


(1 - 0B)(1 - d>F) 


ff Z (uJ) = n 1 (1 — 0) 2 u 2 


But a 2 = (j 2 (1 - </>-), so 


and 


2 (w) 


1 — (jr 


1 +</> 


n (1 - cj)) 2 n 1—0 


ry(w) = <7 U 


1 + 0 


«(1 - 0 ) 


1/2 


Now 0 and <j 2 , are estimated by /■[ and c 0 , respectively, as defined in (2.1.11) and (2.1.12). 
Thus, for a (1, d, 0) process, the required standard error is given by 


d(w) ■ 


c o(! + r i) 


n 1/2 


n(l - r !) J 

Proceeding in this way, the expressions for o(w) given in Table 6.4 may be obtained. 


Tentative Identification of Models A-F. Table 6.5 summarizes the models tentatively 
identified for Series A to F, with the preliminary parameter estimates inserted. These 
parameter values are used as initial guesses for the more efficient estimation methods to be 
described in Chapter 7. 


6.3.7 Choice Between Stationary and Nonstationary Models in Doubtful Cases 

As the results in Tables 6.2 and 6.5 suggest, the preliminary identification of the need 
for differencing and of the degree of differencing is not always easily determined. The 
apparent ambiguity in identifying models for Series A, C, and D (particularly with regard 
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TABLE 6.4 Approximate Standard Error for w, where w, = \ d z t and z, is an ARIMA Process 
of Order (p, d, q ) 


(1, d, 0) 


(0, d, 1) 


><,(! + r,) 

1/2 

c 0 (l + 2/y) 

n(1 - rj) 


n 


1/2 


(2, rf, 0) 


(0, d, 2) 


c 0 (l +n)(l -2 r\ + r 2 ) 

1/2 

c 0 d +2r, +2 r 2 )' 

n{ 1 - r,)(l - r 2 ) 


n 


(l.d, 1 ) 


22 1 + . 


2 r? 


n V2 


1/2 


to the degree of differencing) is, of course, more apparent than real. It arises whenever the 
roots of 4>(B) = 0 approach unity. When this happens, it becomes less and less important 
whether a root near unity is included in 4>(B) or an additional difference is included 
corresponding to a unit root. A more precise evaluation is possible using the estimation 
procedures discussed in Chapter 7 and, in particular, the more formal unit root testing 
procedures to be discussed in Chapter 10. However, the following should be kept in mind: 

1. From time series that are necessarily of finite length, it is never possible to prove that 
a zero of the autoregressive operator is exactly equal to unity. 

2. There is, of course, no sudden transition from stationary behavior to nonstationary 
behavior. This can be understood by considering the behavior of the simple mixed 


TABLE 6.5 Summary of Models Identified for Series A-F, with Initial Estimates Inserted 


Series 

Differencing 

w ± &(w)“ 

d 2 = c 0 

Identified Model 

a 2 

a 

A 

Either 0 

17.06 ±0.10 

0.1586 

z, - 0.87z,_[ = 2.45 
+a, - 0.48a ( _! 

0.098 


or 1 

0.002 ±0.011 

0.1364 

Vz, = a t - 0.53a f _j 

0.107 

B 

1 

-0.28 ± 0.41 

52.54 

Vz, = a, + 0.09a,_! 

52.2 

C 

Either 1 

-0.035 ± 0.047 

0.0532 

Vz, - 0.81Vz,_! = a, 

0.019 


or 2 

-0.003 ± 0.008 

0.0198 

V 2 z, = a, - 0.09a,_! 
-0.07a,_ 2 

0.020 

D 

Either 0 

9.13 ±0.04 

0.3620 

z, - 0.86z,_[ = 1.32 + a, 

0.093 


or 1 

0.004 ±0.017 

0.0965 

Vz, = a, - 0.05a, 

0.096 

E 

Either 0 

46.9 ± 5.4 

1382.2 

z, - 1.32z,_! ± 0.63z,_ 2 
= 14.9 + a, 

289.0 


or 0 

46.9 ± 5.4 

1382.2 

z, - 1.37z,_! ± 0.74z,_ 2 
-0.08z,_ 3 = 13.7 + a, 

287.0 

F 

0 

51.1 ± 1.1 

139.80 

z, + 0.32z,_j - 0.18z,_ 2 
— 58.3 ± a, 

115.0 


When d = 0, read z for w. 
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model 


(1 - 4> x B)(z t - n) = (1 - 0 i B)a t 


Series generated by such a model behave in a more nonstationary manner as (j)\ increases 
toward unity. For example, a series with <p l = 0.99 can wander away from its mean p and 
not return for very long periods. It is as if the attraction that the mean exerts in the series 
becomes less and less as t/q approaches unity, and finally, when is equal to unity, the 
behavior of the series is completely independent of p. 

In doubtful cases, there may be an advantage in employing the nonstationary model 
rather than the stationary alternative (e.g., in treating a </q, whose estimate is close to unity, 
as being equal to unity). This is particularly true in forecasting and control problems. Where 
c/)| is close to unity, we do not really know whether the mean of the series has meaning or 
not. Therefore, it may be advantageous to employ the nonstationary model, which does not 
include a fixed mean p. If we use such a model, forecasts of future behavior will not in any 
way depend on an estimated mean, calculated from a previous period, which may have no 
relevance to the future level of the series. 


6.4 MODEL MULTIPLICITY 

6.4.1 Multiplicity of Autoregressive-Moving Average Models 

With the normal distribution assumption, knowledge of the first and second moments 
of a probability distribution implies complete knowledge of the distribution. In partic¬ 
ular, knowledge of the mean of w t = V d z, and of its autocovariance function uniquely 
determines the probability structure or w t . We now show that although this unique prob¬ 
ability structure can be represented by a multiplicity of linear ARMA models, uniqueness 
is achieved in the model when we introduce the appropriate stationarity and invertibility 
restrictions. 

Suppose that w ,, having autocovariance generating function y(B), is represented by the 
linear ARMA model 


4>(B)w t = 9(B)a t (6.4.1) 

where the zeros of 4>(B) and of 0(B) lie outside the unit circle. Then, this model may also 
be written as 


P 9 

[](1 - G,B)w t = J](l - HjB)a t (6.4.2) 

<=i i= i 

where the G _1 are the roots of <j>(B) = 0 and H~ x are the roots of 6(B) = 0, and G r H . lie 

J J 

inside the unit circle. Using (3.1.11), the autocovariance generating function for w is 
p q 

r(B) = n a - G,B)-\ 1 - GjFy 1 J](l - HjB)( 1 - HjF)o 2 a 

i= 1 7=1 
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Multiple Choice of Moving Average Parameters. Since 

(1 - HjB)( 1 - HjF) = H](l - H~ l B)( 1 - HJ l F) 

it follows that any one of the stochastic models 

p q 

J](l - G,B)w t = [](1 - HfB)ka t 
i= 1 i =1 

can have the same autocovariance generating function if the constant k is chosen appropri¬ 
ately. In the above, it is understood that for complex roots, reciprocals of both members of 
the conjugate pair will be taken (so as to always obtain real-valued coefficients in the MA 
operator). However, if a real root H is inside the unit circle, H~ l will lie outside, or if a 
complex pair, say H l and H 2 , are inside, then the pair H~ ] and H~ 1 will lie outside. It 
follows that there will be only one stationary invertible model of the form (6.4.2), which 
has a given autocovariance function. 

Backward Representations. Now y(B) also remains unchanged if in (6.4.2) we replace 
1 — GjB by 1 — G l F or 1 — HjB by 1 — Hj F. Thus, all the stochastic models 

P 9 

fid - G,B ±1 )w, = JJd - HjB ±x )a t 
i= 1 J =1 

have identical autocovariance structure. However, representations containing the operator 
B~ l = F refer to future w’s and/or future a’ s, so that although stationary and invertible 
representations exist in which w t is expanded in terms of future m/’s and c/’s, only one such 
representation, (6.4.2), exists that relates w t entirely to past history. 

A model form that, somewhat surprisingly, is of practical interest is that in which all 
B's are replaced by F’s in (6.4.1), so that 

4>(F)w t = 8(F)e t 

where e t is a sequence of independently distributed random variables having mean zero 
and variance <7“ = o f This then is a stationary invertible representation in which w, is 
expressed entirely in terms of future m/’s and e’s. We refer to it as the backward form of 
the process, or more simply as the backward process. 

Equation (6.4.2) is not the most general form of a stationary invertible linear ARMA 
model having the autocovariance generating function y(B). For example, the model (6.4.2) 
may be multiplied on both sides by any factor 1 — QB. Thus, the process 

P 9 

(1 - QB) J](l - G,B)w, = (1 - QB) JJ(1 - HjB)a, 
i= 1 j =1 

has the same autocovariance structure as (6.4.2). This fact will present no particular dif¬ 
ficulty at the identification stage, since we will be naturally led to choose the simplest 
representation, and so for uniqueness we require that there be no common factors between 
the AR and MA operators in the model. However, as discussed in Chapter 7, we need to 
be alert to the possibility of common factors in the estimated AR and MA operators when 
fitting the process. 
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Finally, we reach the conclusion that a stationary-invertible model, in which a cur¬ 
rent value w t is expressed only in terms of previous history and which contains 
no common factors between the AR and MA operators, is uniquely determined by the 
autocovariance structure. 

Proper understanding of model multiplicity is of importance for a number of reasons: 

1. We are reassured by the foregoing argument that the autocovariance function can 
logically be used to identify a linear stationary-invertible ARMA model that expresses 
w, in terms of previous history. 

2. The nature of the multiple solutions for moving average parameters obtained by 
equating moments is clarified. 

3. The backward process 


4>t F )w t = 0(F)e t 

obtained by replacing B by F in the linear ARMA model, is useful in estimating 
values of the series that have occurred before the first observation was made. 

Now we consider reasons 2 and 3 in greater detail. 


6.4.2 Multiple Moment Solutions for Moving Average Parameters 

In estimating the q parameters 9 l ,9 2 ,... ,9 q in the MA model by equating autocovariances, 
we have seen that multiple solutions are obtained. To each combination of roots, there will 
be a corresponding linear representation, but to only one such combination will there be an 
invertible representation in terms of past history. 

For example, consider the MA(1) process in w t : 


w t = (1 — 9 l B)a t 


and suppose that j 'q(w) and y\(w) are known and we want to deduce the values of 6 j and 
a 2 . Since 


/o = (1 + e \) a l n=-0icr 2 a y k = 0 k> 1 (6.4.3) 


then 


= 9~ l +0j 
r i 

and if ( 9 l = 9, a 2 = a 2 ) is a solution for given y 0 and y 1 , so is ( 9 j = 0 _1 , a 2 = 9 2 g 2 ). 
Apparently, then, for given values of y {) and jq, there are a pair of possible models: 


w t = (1 — 9B)a t 


and 


w t = (1 — 9 1 B)a t 


(6.4.4) 
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with var[a r ] = a 2 and var[a r ] = a 2 = a 2 9 2 . If -1 < 9 < 1, then (6.4.4) is not an invertible 
representation. However, this model may be written as 

w, = [(1 - 0- l B)(-eF)](-6- x Ba t ) 

Thus, after setting e t = -a t _ x /9, the model becomes 


w, = (1 - 0F)e t (6.4.5) 

where e, has the same variance as a t . Thus, (6.4.5) is simply the “backward” process, 
which is dual to the forward process: 


w t = (1 — 0B)a t (6.4.6) 

Just as the shock a, in (6.4.6) is expressible as a convergent sum of current and previous 
values of w. 


a t = w t + 9w t _ x + d 2 w t _ 2 + ■■■ 

the shock e t in (6.4.5) is expressible as a convergent sum of current and future values of w: 

e, = w t + 9w t+l 4- 0 2 w t+2 + ■■■ 

Thus, the root would produce an “invertible” process, but only if a representation 
of the shock e t in terms of future values of w were permissible. The invertibility regions 
shown in Table 6.1 delimit acceptable values of the parameters, given that we express the 
shock in terms of previous history. 


6.4.3 Use of the Backward Process to Determine Starting Values 

Suppose that a time series W\,w 2 ,, w n is available from a process 

4>(B)w, = 9(B)a t (6.4.7) 

In Chapter 7, problems arise where we need to estimate the values w 0 , w_ { , w_ 2 , and so 
on, of the series that occurred before the first observation was made. This happens because 
‘ ‘starting values’ ’ are needed for certain basic recursive calculations used for estimating the 
parameters in the model. Now, suppose that we require to estimate w_ t , given w ,,..., w n . 
The discussion of Section 6.4.1 shows that the probability structure of ,..., w n is equally 
explained by the forward model (6.4.7), or by the backward model 

4>(F)w t = 9{F)e t (6.4.8) 

The value w_ h thus, bears exactly the same probability relationship to the 
sequence w^, w 2 , ... , w n , as does the value w n+l+ \ to the sequence w n ,w n _ l , 
w n _ 2 ,..., w l . Thus, to estimate a value / + 1 periods before observations started, we can 
first consider what would be the optimal estimate or forecast / + 1 periods after the series 
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ended, and then apply this procedure to the reversed series. In other words, we “forecast” 
the reversed series. We call this “back forecasting.” 


APPENDIX A6.1 EXPECTED BEHAVIOR OF THE ESTIMATED 
AUTOCORRELATION FUNCTION FOR A NONSTATIONARY PROCESS 

Suppose that a series of N observations z lt z 2 ,... ,z N is generated by a nonstationary 
(0,1,1) process 


Vz, = (1 - 9B)a t 


and the estimated autocorrelations r k are computed, where 

r _ c ±_ ~z)( z t+k ~z) 

c ° (~r - ^) 2 


Some idea of the behavior of these estimated autocorrelations may be obtained by deriving 
expected values for the numerator and denominator of this expression and considering the 
ratio. We will write, following Wichern (1973), 


£[r k \ = 


E[c k ] 

E[c 0 ] 


£,=i k E[(z,-z)(z l+k - z] 
HZi E[(z t - z) 2 ] 


After straightforward but tedious algebra, we find that 


£lr k l = 


(N - k)[( 1 - 0) 2 (N 2 -1+2 k 2 - 4kN) - 60] 
N(N - l)[(N + 1)(1 - 6) 2 + 66] 


(A6.1.1) 


For 6 close to zero, £[r k \ will be close to unity, but for large values of 9 , it can be 
considerably smaller than unity, even for small values of k. Figure A6.1 illustrates this 
fact by showing values of £[r k ] for 9 = 0.8 with N = 100 and N = 200. Although, as 
anticipated for a nonstationary process, the ratios £[r k \ of expected values fail to damp out 
quickly, it will be seen that they do not approach the value 1 even for small lags. 

Similar effects may be demonstrated whenever the parameters approach values where 
cancellation on both sides of the model would produce a stationary process. For instance, 
in the example above we can write the model as 


(1 - B)z, = [(1 — B) + SB]a t 

where 8 = 0.2. As 8 tends to zero, the behavior of the process would be expected to come 
closer and closer to that of the white noise process z, = a t , for which the autocorrelation 
function is zero for lags k > 0. 
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(a) A/= 100 




Lag k 

FIGURE A6.1 £[r k ] = E[c k ]/E[c Q \ for series generated by Vz, = (1 — 0.8 B)a t . 


EXERCISES 

6.1. Given the five identified models and the corresponding values of the estimated auto¬ 
correlations of w t = y d z t in the following table: 



Identified Model 

P d 

<? 

Estimated Autocorrelations 

(1) 

1 

1 

0 

r, = 0.72 

(2) 

0 

1 

1 

r l = -0.41 

(3) 

1 

0 

1 

r, = 0.40. r 2 = 0.32 

(4) 

0 

2 

2 

/y = 0.62, r 2 = 0.13 

(5) 

2 

1 

0 

r, = 0.93, r 2 = 0.81 


(a) Obtain preliminary estimates of the parameters analytically. 

(b) Check these estimates using the charts and tables in Part Five of the book. 

(c) Write down the identified models in backward shift operator notation with the 
preliminary estimates inserted. 

6.2. For the (2, 1, 0) process considered on line (5) of Exercise 6.1, the sample mean and 
variance of w t = Vz r are Tv = 0.23 and s 2 w = 0.25. If the series contains N = 101 
observations, 

(a) show that a constant term needs to be included in the model. 
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(b) express the model in the form w t — c/) l w t _ l — 4>~,w t _ 2 = d Q + a t with numerical 
values inserted for the parameters, including an estimate of a\ 

6.3. Consider the chemical process temperature readings referred to as Series C in this 
book. 

(a) Plot the original series and the series of first differences using R. 

(b) Use the R package to calculate and plot the ACF and PACF of this series. Repeat 
the calculation for the first and second differences of the series. 

(c) Specify a suitable model, or models, for this series. Use the method of moments 
to obtain preliminary parameter estimates for the series. 

6.4. Quarterly measurements of the gross domestic product (GDP) in the United Kingdom 
over the period 1955-1969 are included in Series P in Part Five of this book. 

(a) Calculate and plot the ACF and PACF of this series. 

(b) Repeat the analysis in part (a) for the first differences of the series. 

(c) Identify a model for the series. Would a log transformation of the data be helpful? 

(d) Obtain preliminary estimates for the parameters and for their standard errors. 

(e) Obtain preliminary estimates for pt z and 

6.5. Quarterly UK unemployment rate (in thousands) is part of Series P analyzed in Exercise 
6.4. Repeat parts (a) to (e) of Exercise 6.4 for this series. 

6.6. A time series defined by z, = 1000 log l0 (H t ), where H, is the price of hogs recorded 
annually by the U.S. Census of Agriculture on January 1 for each of the 82 years, from 
1867 to 1948 is listed as Series Q in the Collection of Time Series in Part Five. This 
is a well-known time series analyzed by Quenouille (1957), and others. 

(a) Plot the series. Compute and plot the ACF and PACF of the series. 

(b) Identify a time series model for the series. 

6.7. Measurements of the annual flow of the river Nile at Ashwan from 1871 to 1970 are 
available as series “Nile” in the datasets package in R; type help(Nile) for details. 

(a) Plot the series and compute the ACF and PACF for the series. 

(b) Repeat the analysis in part (a) for the differenced series. 

(c) Identify a model for the series. Are there any unusual features worth noting. 

6.8. The file “EuStockMarkets” in the R datasets package contains the daily closing 
prices of four major European stock indices: Germany DAX (Ibis), Switzerland SMI, 
France CAC, and UK FTSE. The data are sampled in business time, so weekends and 
holidays are omitted. 

(a) Plot each of the four series and compute the ACF and PACF for the series. 

(b) Repeat the analysis in part (a) for the differenced series. 

(c) Identify a model for the series. Are there any unusual features worth noting. 

6.9. Download a time series of your choice from the Internet. Plot the time series and 
identify a suitable model for the series. 
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PARAMETER ESTIMATION 


This chapter deals with the estimation of the parameters in ARIMA models and provides a 
general account of likelihood and Bayesian methods for parameter estimation. It is assumed 
that a suitable model of this form has been selected using the model specification tools 
described in Chapter 6. After the parameters have been estimated, the fitted model will be 
subjected to diagnostic checks and goodness-of-fit tests to be described in the next chapter. 
As pointed out by R. A. Fisher, for tests of goodness of fit to be relevant, it is necessary 
that efficient use of data should have been made in the fitting process. If this is not so, 
inadequacy of fit may simply arise because of the inefficient fitting and not because the 
form of the model is inadequate. This chapter examines in detail maximum likelihood 
estimation under the normality assumption and describes least-squares approximations that 
are suitable for many series. 

It is assumed that the reader is familiar with certain basic ideas in estimation theory. 
Appendices A7.1 and A7.2 summarize some important results in normal distribution theory 
and linear least-squares that are useful for this chapter. Throughout the chapter, bold type 
is used to denote vectors and matrices. Thus, X = { x t j } is a matrix with x if an element in 
the ;'th row and yth column, and X' is the transpose of the matrix X. 


7.1 STUDY OF THE LIKELIHOOD AND SUM-OF-SQUARES FUNCTIONS 

7.1.1 Likelihood Function 

Suppose that we have a sample of N observations, z with which we associate an 
/V-dimensional random variable, whose known probability distribution p(z\%) depends 
on some unknown parameters We use the vector £, to denote a general set of parameters 
and, in particular, it could refer to the p + q+ 1 parameters ( <p , 6, cr 2 ) of the ARIMA model. 


Time Series Analysis: Forecasting and Control, Fifth Edition. George E. P. Box, Gwilym M. Jenkins, 
Gregory C. Reinsel, and Greta M. Ljung 
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Before the data are available, p(z\4) will associate a density with each different outcome 
z of the experiment, for fixed 4- After the data have become available, we are led to 
contemplate the various values of £, that might have given rise to the fixed set of observations 
z actually obtained. The appropriate function for this purpose is the likelihood function 
L(f\z), which is of the same form as p( z|^), but in which z is now fixed but £, is variable. 
It is only the relative value of L(^ |z) that is of interest, so that the likelihood function is 
usually regarded as containing an arbitrary multiplicative constant. 

It is often convenient to work with the log-likelihood function ln[X(£|z)] = /(£|z), 
which contains an arbitrary additive constant. One reason that the likelihood function is of 
fundamental importance in estimation theory is because of the likelihood principle, urged 
on somewhat different grounds by Fisher (1956), Barnard (1949), and Birnbaum (1962). 
This principle says that, given that the assumed model is correct, all that the data have to 
tell us about the parameters is contained in the likelihood function, all other aspects of the 
data being irrelevant. From a Bayesian point of view, the likelihood function is equally 
important, since it is the component in the posterior distribution of the parameters that 
comes from the data. 

For a complete understanding of the parameter estimation in a specific case, it is nec¬ 
essary to carefully study of the likelihood function, or in the Bayesian framework, the 
posterior distribution of the parameters, which in the cases we consider, is dominated by 
the likelihood. In many examples, for moderate and large samples, the log-likelihood func¬ 
tion will be unimodal and can be approximated adequately over a sufficiently extensive 
region near the maximum by a quadratic function. In such cases, the log-likelihood function 
can be described by its maximum and its second derivatives at the maximum. The values 
of the parameters that maximize the likelihood function, or equivalently the log-likelihood 
function, are called maximum likelihood [ML] estimates. The second derivatives of the 
log-likelihood function provide measures of “spread” of the likelihood function and can 
be used to calculate approximate standard errors for the estimates. 

The limiting properties of maximum likelihood estimates are usually established for 
independent observations. But as was shown by Whittle (1953), they may be extended 
to cover stationary time series. Other early literature on the parameter estimation in time 
series models includes Barnard et al. (1962), Bartlett (1955), Durbin (1960), Grenander 
and Rosenblatt (1957), Hannan (1960), and Quenouille (1942, 1957). 

7.1.2 Conditional Likelihood for an ARIMA Process 

Let us suppose that the N = n + d original observations z form a time series that we 
denote by z_ d+1 ,..., z 0 , z 1 , z 2 ,..., z„. We assume that this series is generated by an 
ARIMA(/j, d, q) model. From these observations, we can generate a series w of n = N — d 
differences w { , w 2 ,..., w n where w t = V d z r Thus, the general problem of fitting the pa¬ 
rameters (p and 6 of the ARIMA model (6.1.1) is equivalent to fitting to the wf s, the 
stationary and invertible 1 ARMA(/>, q) model, which may be written as 

a, = w, ~ ~ (p 2 w,-2 - (Pp^t-p + 0i^_i 

+ ®2 a t-2 + •" + Qq a t-q (7.1.1) 

where w t = w t — p are the mean-centered observations. 


'Special care is needed to ensure that estimate lies in the invertible region. See Appendix A7.7. 
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For d > 0, it is often appropriate to assume that /./ = 0. When this is not appropriate, we 
assume that the series mean w = £" =1 w t /n is substituted for p. For many sample sizes 
common in practice, this approximation will be adequate. However, if desired, p can be 
included as an additional parameter to be estimated. 

The a ,'s cannot be calculated immediately from (7.1.1) because of the difficulty of 
starting up the difference equation. However, suppose that the p values w* of the w t 's and 
the q values a* of the a,’ s prior to the start of the w t series were given. Then, for any choice 
of parameters (<p, 9). we could calculate successively a set of values afip. 0|w*, a*, w), t = 
1,2Now, assuming that the a t ’s are normally distributed, their probability 
density is 


P(a l ,a 2 ,... ,a n ) <x ((7 2 ) " /2 exp 


-I I — 

l ^ 2a 2 


, r=l 


Given the data w, the log-likelihood associated with the parameter values ($, 6 , c 2 ), con¬ 
ditional on the choice of (w t , a*), would then be 


where 


/,(</», 6, = ~ ln(a 2 )- 


2(7? 


S*(0,0) = d 2 (0,0|w*, a*, w) 


(=1 


(7.1.2) 


(7.1.3) 


In the above equations, a subscript asterisk is used on the likelihood and sum-of-squares 
functions to emphasize that they are conditional on the choice of the starting values. We 
notice that the conditional log-likelihood I,, involves the data only through the conditional 
sum-of-squares function. It follows that contours of i t for any fixed value of a 2 in the space 
of (0, 6 , (7 2 ) are contours of S t , that these maximum likelihood estimates are the same as 
the least-squares estimates, and that in general, we can, on the normal assumption, study the 
behavior of the conditional likelihood by studying the conditional sum-of-squares function. 
In particular for any fixed rr 2 , l t is a linear function of S t . The parameter values obtained 
by minimizing the conditional sum-of-squares function S*(<p, 0) will be called conditional 
least-squares estimates. 


7.1.3 Choice of Starting Values for Conditional Calculation 

We will shortly discuss the calculation of the unconditional likelihood, which, strictly, is 
what we need for parameter estimation. However, when n is moderate or large, a sufficient 
approximation to the unconditional likelihood is often obtained by using the conditional 
likelihood with suitable values substituted for the elements of w* and a* in (7.1.3). One 
procedure is to set the elements of w* and of a* equal to their unconditional expectations. 
The unconditional expectations of the elements of a* are zero, and if the model contains no 
deterministic part, and in particular if p = 0, the unconditional expectations of the elements 
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TABLE 7.1 Sum-of-Squares Functions for the Model Vz ( = (1 — OB)a t Fitted to the IBM 
Data 


e 

a = i-e 

SJ6) 

SW) 

e 

A = 1 - 6 

sm 

S(0) 

-0.5 

1.5 

23,929 

23,928 

\ 0.1 

0.9 

19,896 

19,896 

-0.4 

1.4 

21,595 

21,595 


0.8 

20,851 

20,851 

-0.3 

1.3 

20,222 

20,222 

0.3 

0.7 

22,315 

22,314 

-0.2 

1.2 

19,483 

19,483 

0.4 

0.6 

24,471 

24,468 

-0.1 

1.1 

19,220 

19,220 

0.5 

0.5 

27,694 

27,691 

0.0 

1.0 

19,363 

19,363 






of w* will also be zero 2 . However, this approximation can be poor if some of the roots 
of 4>(B) = 0 are close to the boundary of the unit circle, so that the process approaches 
nonstationarity. This is also true if some of the roots of 6(B) = 0 are close to the boundary 
of the invertibility region. Setting the presample values equal to zero could in these cases 
introduce a large transient, which is slow to die out. For a pure AR(p) model, a more reliable 
approximation procedure, and one we employ sometimes, is to use (7.1.1) to calculate the 
a t ’sfrom a p+l onward, thus using actual values of the w,’s throughout. Using this method, 
we have only n — p = N — p — d values of a t , but the slight loss of information will be 
unimportant for long series. 

For seasonal series, discussed in Chapter 9, the conditional approximation is not always 
satisfactory and the unconditional calculation becomes necessary. Inclusion of the deter¬ 
minant in the unconditional likelihood function can also be important for seasonal time 
series. 

Example: IMA(0, 1,1) Process. To illustrate the recursive calculation of the conditional 
sum of squares S *, we consider the IMA(0,1,1) model tentatively identified in Section 6.4 
for the IBM data in Series B. The model is 

V z, = (1 — 6B)a t — 1 < 6> < 1 (7.1.4) 

so that a r = w, + 6a,_i, where w, = Vz, an&E[w t ] = 0. Thus, for the particular parameter 
value 6 = 0.5, the a,’s are calculated recursively from 

a, = w, + 0.5a r _[ 

setting the initial value a 0 equal to zero. Proceeding in this way, we find that 

368 

5*(0.5) = Y, a^(6 = 0.5 |a 0 = 0) = 27,694 

i=i 

The conditional sums of squares S JO) are shown in Table 7.1 for values of 9 from —0.5 to 
+0.5 in steps of 0.1. We note that S JO) has its minimum for 9 = —0.1. This is consistent 
with the preliminary moment estimate of —0.09 derived for this series in Chapter 6. 


2 If the assumption E\w t | = // J 0 is appropriate, we can substitute w for each of the elements of w t . 
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7.1.4 Unconditional Likelihood, Sum-of-Squares Function, and Least-Squares 
Estimates 

Assuming that the N = n + d observations are generated by an ARIMA model, the uncon¬ 
ditional log-likelihood is given by 

? n ? A(d>, 0) 

K4> , 0 ,a 2 ) = /(</>,0)-§ln(cr^) - (7.1.5) 

u a 

where / ( <p , 0) involves the determinant in the joint density of the tty’s and is a function of 
(j) and 0. The unconditional sum-of-squares function is given by 

n 

S(<p,0) = ^KIw.^ef + tej'Q-'feJ (7.1.6) 

r=l 

where [o,|w. <p , 0] = £[o r |w, <p, 0] denotes the expectation of a t conditional on w, <p , and 
0. When the meaning is clear from the context, we will further abbreviate this conditional 
expectation to [a t ]. In (7.1.6), 

e* = (w X - p , ...,w 0 ,a x _ q ,... ,a 0 Y 

represents the p + q initial values of the w t and a t prior to t = 1, Qcr 2 = co\j e j is the 
covariance matrix of e*, and 

[e*] = ([u>i - p ],..., [w 0 ], [a x _ q ],..., [a 0 ])' 

denotes the vector of conditional expectations ("back-forecasts”) of the initial values, 
given w, <t> , and 0. An alternative way to represent S(<p, 0) is as 

n 

S(4>,ff)= J [a,] 2 

t =—00 

which in comparison with (7.1.6) indicates that S®__ 00 [a f ] 2 = [e*] , Q _1 [e <1 ]. 

Usually, f(<p, 0) is of importance only for small n. For moderate and large values 
of n, (7.1.5) is dominated by S(<p, 6)/2a 2 , and thus the contours of the unconditional 
sum-of-squares function in the space of the parameters (0,0) are very nearly contours 
of the likelihood and log-likelihood. It follows, in particular, that the parameter estimates 
obtained by minimizing the sum of squares (7.1.6), which we call (unconditional or exact) 
least-squares estimates, will usually provide very close approximations to the maximum 
likelihood estimates. From a Bayesian viewpoint, on assumptions discussed in Section 7.5, 
for all AR(p) and MA(y), essentially the posterior density is a function only of S(<p,0). 
Hence, very nearly the least-squares estimates are those with maximum posterior density. 
In the remainder of this section and in Section 7.1.5, the main emphasis will be on the 
unconditional sum-of-squares function S(<p , 0) in (7.1.6), and its use in calculating least- 
squares estimates. An alternate method for calculation of the unconditional sum of squares 
and likelihood functions based on the state-space model and innovations approach will be 
discussed in Section 7.4. 

In the calculation of the unconditional sum of squares, the [a t ] 's are computed recursively 
by taking conditional expectations in (7.1.1). A preliminary back-calculation provides the 
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values [w_j] and [a_j], J =0,1,2,... (i.e., the back-forecasts) needed to start off the 
forward recursion. 

Calculation of the Unconditional Sum of Squares for a Moving Average Process. For 
illustration, we reconsider the IBM stock price example using only the first 10 values of 
the series. 3 For the IMA(0, 1, 1) model, the only back-forecast that is needed for S(O) is 
[o 0 ]- We begin by describing an approximate, but nevertheless accurate, method to obtain 
[o 0 ]- Recall from Section 6.4.3 that the model for w, may be written in either the forward 
or backward forms: 


w t = (1 — 8B)a t w t = (1 — 6F)e 1 


and where again p = E[w t ] is assumed equal to zero. Flence, we can write 


[e t ] = [w t ]+0[e t+l ] (7.1.7) 


[a t \ = \w t \ + 6[a t _ l \ (7.1.8) 

where [w t ] = w t for t = 1,2,... ,n and is the back-forecast of w t for t < 0. These are 
the two basic equations that we need in the computations. A convenient format for the 
calculations is shown in Table 7.2. We begin by entering in the table what we know: 

1. The data values z 0 , Zj,..., z 9 , from which we can calculate the first differences 
irq, w 2 ,.... w 9 . 

2. The values [e 0 ], [e_j],..., which are zero, since e () ,e_],... are distributed indepen¬ 
dently of w. 

3. The values [a_j], [a_ 2 ],..., which are zero, because for any MA(g) pro¬ 
cess, a_ q , a_ q _ l , ... are distributed independently of w. However, note that 
[o 0 ], [fl_i], ■■■, [a_ q+ \\ will be nonzero and can be obtained by back-forecasting. 
Thus, in the present example, [o 0 ] is computed this way. 

Beginning at the end of the series, (7.1.7) is now used to compute the [e,]’s for 
t = 9, 8,7, ..., 1. We start the backward process by setting [e 10 ] = 0. The effect of this 
approximation will be to introduce a transient into the system. However, for series of mod¬ 
erate length, the effect will typically be negligible by the time the beginning of the series is 
reached and thus will not affect the calculation of the a t ’s. If desired, the adequacy of this 
approximation can be checked in any given case by performing a second iterative cycle. 

Thus, to start the recursion in Table 7.2, in the row corresponding to t = 9, we enter a 
zero in the sixth column for the unknown value 0.5[e 10 ]. Then, using (7.1.7), we obtain 

[eg] = [w 9 ] + 0.5[e 10 ] 

= Wg + 0 = -3 


3 In practice, of course, useful parameter estimates could not be obtained from as few as 10 observations. We 
utilize this data subset merely to illustrate the calculations. 
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TABLE 7.2 Calculation of the [a]’s from the First 10 Values of Series B, Using 0 = 0.5 


t 

z , 

[«,] 

0.5 [a,_,] 

[«-’,] 

o-5K +1 ] 

K1 

u , 

-1 

[458.4] 

0 

0 

0 

0 

0 


0 

460 

1.6 

0 

1.6 

-1.6 

0 

-2.1 

1 

457 

-2.2 

0.8 

-3.0 

-0.1 

-3.1 

-4.1 

2 

452 

-6.1 

-1.1 

-5.0 

4.8 

-0.2 

-2.3 

3 

459 

3.9 

-3.0 

7.0 

2.6 

9.6 

8.5 

4 

462 

5.0 

2.0 

3.0 

2.3 

5.3 

9.5 

5 

459 

-0.5 

2.5 

-3.0 

7.6 

4.6 

9.2 

6 

463 

3.7 

-0.2 

4.0 

11.1 

15.1 

19.4 

7 

479 

17.9 

1.9 

16.0 

6.2 

22.2 

31.4 

8 

493 

22.9 

9.0 

14.0 

-1.5 

12.5 


9 

490 

8.5 

11.5 

-3.0 

0 

-3.0 

8.5 


so 0.5[e 9 ] = —1.5 can be entered in the line t = 8, which enables us to compute [e 8 ], and 
so on. Finally, we obtain 


[«„] = t^ol + 0 t e i] 

that is, 0 = [w Q ] — 1.6, which gives [w 0 ] = 1.6, and thereafter [w_ h ] = 0, h = 1,2, 3,.... 
Now, using (7.1.8) with r = 0, we obtain 

[fl 0 ] = [w 0 ] + = 1-6 + (0.5)(0) = 1.6 

and we can then continue the forward calculations of the remaining [a,]’s, leading to 
5(0.5) = S^ =0 [«,|0.5, w] 2 = 1016.406. 

An alternative method that yields exact estimates of the presample values is presented 
in Appendix A7.3. For the model considered above, this method involves first com¬ 
puting the values a t (a 0 = 0), which we abbreviate as a®, by the conditional method as 
cP t = w t + OcP yt = 1,2,... ,n, using ajj = 0 as the initial value. Then a backward re¬ 
cursion is performed to obtain = c/ 1 + 0u t+1 , beginning from t = n, down to t = 0, 
with « )J+1 = 0 as the starting value. Finally, then, the exact estimate of [o 0 ] is given by 
[o 0 ] = —— 6> 2 )/(l — 0 2(,,+1) ). Using this starting value, the [o r ] are computed from the 
forward recursion [a t ] = w t + = 1,2,... ,n, as in (7.1.8) and the exact sum of 

squares becomes S(0 ) = Z''_ 0 [o r ] 2 . 

In the above example, by first computing the using a forward recursion setting a Q 
= 0, we obtain the values of u t by the backward recursion for r = 9,8,... ,0, displayed 
in the final column of Table 7.2. Flence, we obtain the exact estimate of a 0 as [o 0 ] = 
—u 0 (l — 6> 2 )/(l — 0 2(n+1 ' 1 ) = 1.549. This value is very close to the approximate value of 
1.545 obtained by the backward model approach, and the small difference has essentially 
no effect on the calculation of the remaining values [c/ r ]. Using the exact method for the 
entire series, we find that the unconditional sum of squares for 6 = 0.5 is 

368 

5(0.5) = ^[a ? |0.5, w] 2 = 27,691 
1=0 
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which for this particular example is very close to the conditional value .S'* (0.5) = 27,694. 
The unconditional sum of squares S(9), for values of 9 between —0.5 and +0.5, have been 
added to Table 7.1 and are very close to the conditional values S J9) computed earlier. 


7.1.5 General Procedure for Calculating the Unconditional Sum of Squares 

In the above example, w t was a first-order moving average process, with zero mean. It 
followed that all forecasts for lead times greater than 1 were zero and consequently that only 
one preliminary value (the back-forecast [w Q ] = 1.6) was required to start the recursive cal¬ 
culations using the approximate approach, and only one value [a 0 ] in the exact approach. For 
a gth-order moving average process, q nonzero preliminary values [w Q ], [w_ l ],..., [w 1-9 ] 
would be needed, or equivalently, the q values [a 0 ], [a_j],..., [a\-q\ in the exact approach, 
with S(6) = Y^t=i-q\- a t\ 2 ■ Special procedures, which we discuss in Section 7.3.1, are avail¬ 
able for estimating parameters in autoregressive models. However, we show in Appendix 
A7.3 that the procedure described in this section can supply the unconditional sum of 
squares for any ARIMA model. 

Specifically, suppose that the w t ’ s are generated by the stationary forward model 

4>{B)w t = 9(B)a t (7.1.9) 

where w t = S7 d z t and w t = w, — //. Then, they could equally well have been generated by 
the backward model 


4>(F)w t = 9(F)e t (7.1.10) 

As before, in the approximate method that utilizes the backward model, we could first 
employ (7.1.10) to supply back-forecasts [w_j\w.<p,G]. Theoretically, the presence of 
the autoregressive operator ensures a series of such estimates that is infinite in extent. 
However, assuming stationarity, the estimates [w t ] at and beyond some point i = —Q, with 
Q of moderate size, become essentially equal to zero. Thus, to a sufficient approximation, 
we can write 


oo Q 

w, = cf>~ l (B)9(B)a, = X ¥j a t-j - X Vi a t-j 
y=0 j =0 

This means that the original mixed process could be replaced by a moving average process 
of order Q, and the procedure for moving averages outlined in Section 7.1.4 may be used. 

Thus, in general, the dual set of equations for generating the conditional expectations 
[ a t \<p , 6. w] is obtained by taking conditional expectations in (7.1.10) and (7.1.9). That is, 

^F)[w t ] = 9{F)[e t ] (7.1.11) 

is first used to generate the backward forecasts and then 


(j){B)[w t \ = 9(B)[a t \ 


( 7 . 1 . 12 ) 
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is used to generate the [a,]’s. If we find that the forecasts are negligible in magnitude 
beyond some lead time Q , the recursive calculation goes forward with 

[e_j\(p, 6, w] = 0 j = 0,1,2,... 

[a_,l0,e,w] = O 7 ><2-1 (7.1.13) 

and the sum of squares is approximated by S(<p, 0) = ^" = |_g[r/,] 2 . As mentioned earlier, 
a second iterative cycle in this approximate method could be used, if desired. 

Alternatively, for the general model (7.1.9), the exact method discussed in Appendix 
A7.3 can be used to obtain the sum of squares as 

n 

S(4>, 0)= Yj [a ' ]2 + ([w * ] - C'taj/K-'awJ - C'[aJ) (7.1.14) 

t=l-q 

Here, the vectors [w t ]' = ([itq ],..., [w 0 ]) and [a^]' = ([flj_ 9 ],..., [a 0 ]) are the exact 
back-forecasted values obtained as in (A7.3.12). They are given by [e*] = ([wj', [a^]')' = 
D _1 F ; u, wherethe values u t , t = 1,..., n of the vector u are obtained through the backward 
recursion;/, = o® + 0 l u t+l + + 6 q ii t+q with zero initial values u n+l = ■■■ = u n+q = 0,and 

the a° t are the conditional values of the a, computed from (7.1.12) using zero initial values, 
= ••• = ojj = 0 and = ■■■ = w® = 0. After solving the equations D[eJ = F'u, as 
described in (A7.3.12), the exact [a r ]’s are then calculated through the recursion 

[a,] = [Wf] -4>i[w t _{i - <t> p [w t _ p ~\ +6>i[a f _i] + ••• + 0 q [a t _ q ] (7.1.15) 

for t = 1,2,... ,n using the exact back-forecasts as starting values, with [w t ] = w t for 
1 < t < n. The matrices C, K, D, and F necessary for the computation in (7.1.14) are 
defined explicitly in Appendix A7.3. 

Comment on the Approximation. We saw that for the IMA(0, 1,1) model fitted to the 
IBM Series B, the conditional sums of squares provides a very close approximation to 
the unconditional value. This will generally be the case for sufficiently long nonseasonal 
time series. However, as is discussed further in Chapter 9, for seasonal series, in particu¬ 
lar, the conditional approximation becomes less satisfactory and the unconditional sum of 
squares should ordinarily be computed. Moreover, including the determinant in the like¬ 
lihood function to obtain exact maximum likelihood estimates of the parameters can be 
beneficial if the roots of the moving average operator are close to the unit circle. 

Simulation studies have been performed by Dent and Min (1978) and Ansley and 
Newbold (1980) to empirically investigate and compare the performance of the conditional 
least-squares, unconditional least-squares, and maximum likelihood estimators for ARMA 
models. Generally, the conditional and unconditional least-squares estimators serve as 
satisfactory approximations to the maximum likelihood estimator for large-sample sizes. 
However, the simulation evidence suggests a preference for the maximum likelihood esti¬ 
mator for small- or moderate-sample sizes, especially if the moving average operator has 
a root close to the boundary of the invertibility region. Some additional information on the 
relative performance of the different estimators was provided by Hillmer and Tiao (1979) 
and Osborn (1982), who examined the expected values of the conditional sum of squares, 
the unconditional sum of squares, and the log-likelihood for an MA(1) model, as functions 
of the unknown parameter 0, for different sample sizes n. These studies provide an idea of 
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FIGURE 7.1 Plot of .S’09) for Series B. 


how the corresponding estimators will behave for various sample sizes, and the results are 
consistent with those obtained from simulation studies. 


7.1.6 Graphical Study of the Sum-of-Squares Function 

The sum-of-squares function S{9) for the IBM data given in Table 7.1 is plotted in 
Figure 7.1. The overall minimum sum of squares is at about 0 = —0.09 (A = 1.09), which 
is the least-squares estimate and, on the assumption of normality, a close approximation to 
the maximum likelihood estimate of the parameter 9. 

The graphical study of the sum-of-squares functions is readily extended to two parame¬ 
ters by evaluating the sum of squares over a suitable grid of parameter values and plotting 
contours. As discussed earlier, on the assumption of normality, the contours are very nearly 
likelihood contours. Figure 7.2 shows a grid of A(2 0 , A i) values for Series B fitted with the 
IMA(0, 2, 2) model: 


V 2 z f = (l— 9 X B- 9 2 B 2 )a, 

= [1 - (2 -A 0 - A X )B - (A 0 - 1 )B 2 ]a t (7.1.16) 


or in the form 


V“z ? — U 0 V -f- A x )a t _ x + 

The minimum sum of squares in Figure 7.2 is at about A 0 = 1.09 and A , = 0.0. The plot 
thus confirms that the preferred model in this case is an IMA(0, 1, 1) process. The device 
illustrated here, of fitting a model somewhat more elaborate than that expected to be 
needed, can provide a useful confirmation of the original identification. The elaboration of 
the model should be made, of course, in the direction “feared” to be necessary. 







STUDY OF THE LIKELIHOOD AND SUM-OF-SQUARES FUNCTIONS 219 


I 


1.4 

566 

505 

569 

452 

455 

482 

545 

690 

1099 

3843 



1.3 

544 

458 

416 



,400 

430 

494 

633 

1024 3595 


1.2 

562 

452 

40T 

375 

366 

370 

Sqo 

435 

526 

743 1550 

33084 

1.1 

589 

456 

®2 

360 

346 

345 

358 

3q9 

452 

586 960 

3375 


1.0 

620 

467 

138 

349 

330 

325 

332 

353 

Ss9 

486 694 

1472 

28837 

0.9 

641 

472 

186 

341 

317 

308 

310 

324 

356 

V417 546 

905 

3118 

0.8 

626 

470 

183 

333 

306. 

/m 

*29> 

^01 

323 

36flv452 

652 

1397 

0.7 

590 

457 

J 375 

326 

2Sro 

281 

276 

281 


328 3k 

511 

857 

0.6 

564 

439 

/ 364 

316 

^86 

269 

262 

262 

274 

307 240 

^22 

615 

0.5 

540 

42(7 

349 

30ft 

276 

258 

249 

248 

255 

272 \K)3 

36lS 

.481 

0.4 

505 

306 

332 

/91 

264 

24> 


235* 


251 274 s 

S316 

39? 


0.3 

0.2 

0.1 

0 


420v 328 I 277 2/5 23 


99 194 192 195 >s 802' 


0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 

4 * 


FIGURE 7.2 Values of AXAg, A,) X 10 2 for Series B on a grid of (A 0 , 4) values and approximate 
contours. 


Three Parameters. When we wish to study models with three parameters, two-dimensional 
contour diagrams for a number of values of the third parameter can be drawn. For illustra¬ 
tion, part of such a series of diagrams is shown in Figure 7.3 for Series A, C, and D. In 
each case, the “elaborated” model 

V 2 z, = (1 -e^B- 0 2 B 2 - 0 3 B 2 )a t 

= [1 - (2 - /Lj - 4 - A X )B - (A Q + 2A_! - 1 )B 2 + A^B^a, 


or 


V“z, — (A_^V + A 0 V + Ai)a t _i + y~a r 

has been fitted, leading to the conclusion that the best-fitting models of this type 4 are as 
shown in Table 7.3. 

The inclusion of additional parameters (particularly A_ t ) in this fitting process is not 
strictly necessary, but we have included them to illustrate the effect of overfitting and to 
show how closely our identification seems to be confirmed for these series. 


4 We show later in Section 7.2.5 that slightly better fits are obtained in some cases with closely related models 
containing “stationary” autoregressive terms. 
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FIGURE 7.3 Sum-of-squares contours for Series A, C, and D (shaded lines indicate boundaries of 
the invertibility regions). 


TABLE 7.3 IMA Models Fitted to Series A, C, and D 


Series 

A-i 

i 0 


Fitted Series 

A 

0 

0.3 

0.0 

Vz, = 0.3a f _[ + Va, 

C 

0 

1.1 

0.8 

V 2 z, = l.lVa f _[ + 0.8a,_[ + V 2 a, 

D 

0 

0.9 

0.0 

Vz, = 0.9 a,_[ + Va, 


7.1.7 Examination of the Likelihood Function and Confidence Regions 

The likelihood function is not, of course, plotted merely to indicate maximum likelihood 
values. The graph of this function contains the totality of information that comes from the 
data. In some fields of study, cases can occur where the likelihood function has two or more 
peaks and also where the likelihood function contains sharp ridges and spikes. In each such 
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case, the likelihood function is trying to tell us something that we need to know. Thus, the 
existence of two peaks of approximately equal heights implies that there are two sets of 
parameter values that might explain the data. The existence of obliquely oriented ridges 
means that a value of one parameter, considerably different from its maximum likelihood 
value, could explain the data if accompanied by a value of the other parameter, which 
deviated appropriately. To understand the estimation fully, it is thus useful to examine the 
likelihood function both analytically and graphically. 

Need for Care in Interpreting the Likelihood Function. Care is needed in interpreting 
the likelihood function. For example, results discussed later, which assume that the log- 
likelihood is approximately quadratic near its maximum, will clearly not apply to the 
three-parameter cases depicted in Figure 7.3. However, these examples are exceptional 
because here we are deliberately overfitting the model. If the simpler model is justified, 
we should expect to find the likelihood function contours truncated near its maximum by a 
boundary in the higher dimensional parameter space. However, quadratic approximations 
could be used if the simpler identified model rather than the overparameterized model was 
fitted. 

Special care is needed when the maximum of the likelihood function may be on or near 
a boundary. Consider the situation shown in Figure 7.4 and suppose we know a priori that 
a parameter f) > /J () . The maximum likelihood within the permissible range of ft is at B. 
where f) = /J 0 , not at A or at C. It will be noticed that the first derivative of the likelihood is 
in this case nonzero at the maximum likelihood value and that the quadratic approximation 
is certainly not an adequate representation of the likelihood. 

When a class of estimation problems are examined initially, it is important to plot the 
likelihood function to identify potential issues. After the behavior of a potential model is 
well understood, and knowledge of the situation indicates that it is appropriate to do so, we 
may take certain shortcuts, which we now consider. We begin by considering expressions 
for the variances and covariances of maximum likelihood estimates, appropriate when the 
log-likelihood is approximately quadratic and the sample size is moderately large. 

In what follows, it is convenient to define a vector ft whose k = p + q elements are 
the autoregressive and moving average parameters 4> and 6. Thus, the complete set of 



0 


Hypothetical likelihood function with a constraint /? > /?„. 


FIGURE 7.4 
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p + q + 1 = k + 1 parameters of the ARMA process may be written as (p. 9, a 2 \ or as fi, cr 2 ; 
or simply as 

Variances and Covariances of ML Estimates. For the appropriately parameterized ARMA 
model, it will often happen that over the relevant 5 region of the parameter space, the log- 
likelihood is approximately quadratic in the elements of /? (i.e., of <p and 6), so that 


k k 

m = m o 2 a ) up, \ Yj E w - tow - fa c 7 - 1 - 17 ) 

;=i j =i 

where, to the approximation considered, the derivatives 


, d 2 l(p.a 2 a ) 
iJ ~ dPidpj 


(7.1.18) 


are constant. For large n, the influence of the term /(</>, 9) in (7.1.5) can be ignored in most 
cases. Hence, /(/?, a 2 ) will be essentially quadratic in fi if this is true for Alternatively, 
/(/?, a 2 ) will be essentially quadratic in /J if the conditional expectations [a,\fi, w] in (7.1.6) 
are approximately locally linear in the elements of fi. Thus, for moderate- and large-sample 
sizes n, when the local quadratic approximation (7.1.17) is adequate, useful approximations 
to the variances and covariances of the estimates and approximate confidence regions may 
be obtained. 


Information Matrix for the Parameters ft. The (k X k) matrix —{£[/y]} = I (ft ) is re¬ 
ferred to (Fisher, 1956; Whittle, 1953) as the information matrix for the parameters fi, 
where the expectation is taken over the distribution of w. For a given value of a 2 , the 
variance-covariance matrix V(/J) for the ML estimates ji is, for large samples, given by 
the inverse of this information matrix, that is, 

\0) - {-EU^r 1 =r\p) (7.1.19) 

For example, if k = 2, the large-sample variance-covariance matrix is 

E[i n ] E[i l2 \y l 
E[l 12 ] E[l 2 2\ 

In addition, the ML estimates ji obtained from a stationary invertible ARMA process 
were shown to be asymptotically distributed as multivariate normal with mean vector fi 
and covariance matrix I _1 (/J) (e.g., Mann and Wald, 1943; Whittle, 1953; Hannan, 1960; 
Walker, 1964) in the sense that n ] / 2 (ji — f) ) converges in distribution to the multivariate 
normal N{0,I~\fi)} as n —► 00 , where I*(/J) = limn _1 I(/J). The specific form of the 
information matrix It/1) and the limiting matrix I 5 (/l) for ARMA(p, q) models are described 
in Section 7.2.6, and details on the asymptotic normality of the estimator 0 are examined 
for the special case of AR models in Appendix A7.5. 


VQ3) = 


V(fi0 cov[p l , fi 2 ] 
cov[/?|, /? 2 ] V0 2 ) 


5 Say over a 95% confidence region. 
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Now, using (7.1.5), we have 


'• 2<T" 


(7.1.20) 


where 

d 2 S(fi |w) 
iJ dpfdfij 

Furthermore, if for large samples, we approximate the expected values of /y or of Ay by 
the values actually observed, then, using (7.1.19), we obtain 

V(/3) * {-E[l, tj]}- 1 ~ 2o- 2 a {E[S u ]}- 1 c 2<7 2 {Ay}- 1 (7.1.21) 

Thus, for k = 2, 


V(j0) * 2a 2 


d 2 S(fi) 

d 2 S{fi ) 

dp 2 

dfi l dfi 2 

d 2 S(fi) 

d 2 S(fi) 

dfiidfi 2 

dfi 2 


If A(/J) were exactly quadratic in fi over the relevant region of the parameter space, then 
all the derivatives Ay would be constant over this region. In practice, the Ay will vary 
somewhat, and we will usually assume that the derivatives are determined at or near the 
point fi. Now, it is shown in the Appendices A7.3 and A7.4 that an estimate 6 of a 2 is 
provided by 


S(fi) 


(7.1.22) 


and that for large samples, 6 2 and fi are uncorrelated. Finally, the elements of (7.1.21) may 
be estimated from 


cov[/? ( , fij] ^ 2a 2 S ij (7.1.23) 

where the (k X k) matrix {A'-'} is given by {A'- 7 } = {Ay} -1 and the expression (7.1.23) is 
understood to define the variance V(fij) when j = i. 

Approximate Confidence Regions for the Parameters. In particular, these results allow 
us to obtain the approximate variances of our estimates. By taking the square root of these 
variances, we obtain approximate standard errors (SE) of the estimates. The standard error 
of an estimate fi[ is denoted by SE [/?,]. When we have to consider several parameters 
simultaneously, we need some means of judging the precision of the estimates jointly. 
One means of doing this is to determine a confidence region. If, for given a 2 , l(fi. o 2 ) 
is approximately quadratic in fi in the neighborhood of fi, then using (7.1.19) (see also 


6 Arguments can be advanced for using the divisor n — k = n — p — q rather than n in (7.1.22), but for moderate- 
sample sizes, this modification does not make much difference. 
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TABLE 7.4 5(A) and Its First and Second 
Differences for Various Values of A for Series B 


a = i - e 

5(A) 

V(5) 

V 2 (5) 

1.5 

23,928 

2,333 

960 

1.4 

21,595 

1,373 

634 

1.3 

20,222 

739 

476 

1.2 

19,483 

263 

406 

1.1 

19,220 

-143 

390 

1.0 

19,363 

-533 

422 

0.9 

19,896 

-955 

508 

0.8 

20,851 

-1,463 

691 

0.7 

22,314 

-2,154 

1069 

0.6 

24,468 

-3,223 


0.5 

27,691 




Appendix A7.1), an approximate 1 — e confidence region will be defined by 

- Z E E[l uM - PiMj - (7.1.24) 

i j 

where z^(k) is the significance point exceeded by a proportion e of the / 2 distribution, 
having k degrees of freedom. 

Alternatively, using the approximation (7.1.21) and substituting the estimate of (7.1.22) 
for er 2 , the approximate confidence region is given by 7 

Z Z S U ( P> - PtMj - < 2 °Ue ( k ) (7.1.25) 

i j 

However, for a quadratic S(/3 ) surface 

S(P) - S(P) = i Z Z S uWi - (7- 1 -26) 

i j 


Thus, using (7.1.22) and (7.1.25), we finally obtain the result that the approximate 1 — e 
confidence region is bounded by the contour on the sum-of-squares surface, for which 


S(p) = S($) 



(7.1.27) 


Examples of the Calculation of Approximate Confidence Intervals and Regions. 

1. Example: Series B. For Series B, values of 5(A) and of its differences are shown 
in Table 7.4. The second difference of 5(A) is not constant, and thus 5(A) is not 
strictly quadratic. However, in the range from A = 0.85 to A = 1.35, V 2 (5) does not 
change greatly, so that (7.1.27) can be expected to provide a reasonably close approx- 


7 A somewhat closer approximation based on the F distribution, which takes account of the approximate sampling 
distribution of r>\ may be employed. For moderate-sample sizes this refinement does not make much practical 
difference. 
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imation. With a minimum value S(A) = 19,216, the critical value ,S’(2), defining an 
approximate 95% confidence interval, is then given by 

) = 19,416 

Reading off the values of A corresponding to ,S’(2) = 19,416 in Figure 7.1, we obtain 
an approximate confidence interval 0.98 < A < 1.19. 

Alternatively, we can employ (7.1.25). Using the second difference at A = 1.1, 
given in Table 7.4, to approximate the derivative, we obtain 

s = PS_ ~ 390 
11 _ dA 2 ~ (0.1) 2 

Also, using (7.1.22), a 2 = 19,216/368 = 52.2. Thus, the 95% confidence interval, 
defined by (7.1.25), is 


S(A) = 19,216 (l 4- — 
V 368 


(A - 1.09) 2 < 2 x 52.2 x 3.84 

(O.l) 2 

that is, | A — 1.091 < 0.10. Thus, the interval is 0.99 < A < 1.19, which agrees closely 
with the previous calculation. 

In this example, where there is only a single parameter A, the use of (7.1.24) and 
(7.1.25) is equivalent to using an interval A ± u e / 2 <r(A), where u e / 2 is the value, which 
excludes a proportion e/2 in the upper tail of the standard normal distribution. An 

approximate standard error for A, <r(l) = yj2B 2 S~^, is obtained from (7.1.23). In the 
present example, 


V(A) = 2& 2 a S 


-l 

li 


2 X 52.2 X O.l 2 
390 


0.00268 


and the approximate standard error is &(A) = \J 0.00268 = 0.052. Thus, the approxi¬ 
mate 95% confidence interval is A ± 1 ,96&(A) = 1.09 ± 0.10, as before. 

Finally, we show later in Section 7.2.6 that it is possible to evaluate (7.1.19) 
analytically, for large samples from an MA(1) process, yielding 


V(A)^ 


A(2 - A) 
n 


For the present example, substituting A = 1.09 for A , we find that V (A) ~ 0.00269, 
which agrees closely with the previous estimate and so yields the same standard error 
of 0.052 and the same confidence interval. 

2. Example: Series C. In the identification of Series C, one model that was entertained 
was a (0, 2, 2) process. To illustrate the application of (7.1.27) for more than one 
parameter. Figure 7.5 shows an approximate 95% confidence region (shaded) for A (] 
and of Series C. For this example, A(!) = 4.20, n = 224, and j 2 Q5 (2) = 5.99, 
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2.0 



1.0 


A 


0 


1.0 


2.0 


FIGURE 7.5 Sum-of-squares contours with shaded 95% confidence region for Series C, assuming 
a model of order (0, 2, 2). 

so that the approximate 95% confidence region is bounded by the contour for 
which 



7.2 NONLINEAR ESTIMATION 
7.2.1 General Method of Approach 

The plotting of the sum-of-squares function is of particular importance in the study of new 
estimation problems because it ensures that any peculiarities in the estimation situation 
show up. When we are satisfied that anomalies are unlikely, other methods may be used. 

We have seen that for most cases, the maximum likelihood estimates are closely ap¬ 
proximated by the least-squares estimates, which minimize 


n 


S(4>,0)= + 


t =l 


and in practice, this function can be approximated by a finite sum '^'i =l _Q[a,] 2 . 

In general, considerable simplification occurs in the minimization with respect to /J, 
of a sum of squares S”_ 1 [f t (P)] 2 , if each f,(P) (t = 1,2,..., n) is a linear function of the 
parameters p. We now show that the autoregressive and moving average models differ with 
respect to the linearity of the [a,]. For the purely autoregressive process, [a t ] = 4>(B)[w t ] = 


[w t ] - 'L p . = x (j) i [w t _ i ] and 
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Now for u > 0, [w u ] = w u and d[w u \/d(pi = 0, while for u < 0, [w u ] and d[w u \/d<pj are 
both functions of (p. Thus, except for the effect of “starting values,” [a t \ is linear in the 
(p's. By contrast, for the pure moving average process, 

i d[a,] t , d[u),] 

[a,] = e~\B)[w t ] = 9~ 2 (B)[w t _j] + e~\B)-^- 

°°j °°j 

so that the [a,]’s are always nonlinear functions of the moving average parameters. 

We will see in Section 7.3 that special simplifications occur in obtaining least-squares 
and maximum likelihood estimates for the autoregressive process. We show in the present 
section how, by iterative application of linear least-squares, estimates may be obtained for 
any ARMA process. 

Linearization of the Model. In what follows, we continue to use ft as a general symbol for 
the k = p + q parameters (<p, 0). We need, then, to minimize 

n n 

S(4>,0)^ Yj fif = Yj [a ‘ ]2 

t=l-Q t=\-Q 

Expanding [a,] in a Taylor series about its value corresponding to some guessed set of 
parameter values p' () = (/?[ 0 , fa 0’ ■ • • - A o)’ we h ave approximately 

k 

[a t ] = [a u0 ]-Y(Pi-P i .o) x ,j (7-2.D 

;=i 

where [o f 0 ] = [a r |w, /J 0 ] and 

d[a ,]I 


Now, if X is the (n + Q) X k matrix { x r , }. then the n + Q equations (7.2.1) may be expressed 
as 


[a 0 ] = - P 0 ) + [a] 

where [a 0 ] and [a] are column vectors with n + Q elements. 

The adjustments f — /J 0 , which minimize S(fi) = S(<p , 9) = [a]'[a], may now be ob¬ 
tained by linear least-squares, that is, by “regressing” the [aj’s onto the x’s. This gives the 
usual linear least-squares estimates, as presented in Appendix A7.2.1, of the adjustments 
as p — P 0 = (X / X) _1 X , [a 0 ], hence, f = /J 0 + (X'Xj-'X'faoj. Because the [aj’s will not 
be exactly linear in the parameters fi, a single adjustment will not immediately produce the 
final least-squares values. Instead, the adjusted values fi are substituted as new guesses and 
the process is repeated until convergence occurs. Convergence is faster if reasonably good 
guesses, such as may be obtained at the identification stage, are used initially. If sufficiently 
bad initial guesses are used, the process may not converge at all. 

7.2.2 Numerical Estimates of the Derivatives 

The derivatives x t may be obtained directly, as we illustrate later. They can also 
be computed numerically using a general nonlinear least-squares routine. This is done 
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by perturbing the parameters “one at a time.” Thus, for a given model, the val¬ 
ues [adw,/?i o,/? 2 ,(E , Pk,ol f° r ? = 1 — Q,... ,n are calculated recursively, using what¬ 

ever preliminary “back-forecasts” may be needed. The calculation is then repeated for 

[«/|w,/?i,o + <5 l5 /? 2 , 0 ’ ••• .4,oL then for [a f |w,/? lj0 >4,0 + 4. ••• ,4,oL and so on - The neg¬ 
ative of the required derivative is then given to sufficient accuracy using 

Klw,4,o- ’Pi, 0 ’ .4,o] _ [«/|w,4,o. — Pi,o + 4 —4,o] . 

x ti = -s- (7.2.2) 

The numerical method described above has the advantage of universal applicability and 
requires us to program the calculation of the [o,]’s only, not their derivatives. General 
nonlinear estimation routines, which essentially require only input instructions on how to 
compute the [a ( ]’s, are generally available. In some versions, it is necessary to choose the 
<5’s in advance. In others, the program itself carries through a preliminary iteration to find 
suitable P’s. Many programs include special features to avoid overshoot and to speed up 
convergence. 

Provided that the least-squares solution is not on or near a constraining boundary, the 
value ofX = X„- from the final iteration may be used to compute approximate variances, 
covariances, and confidence intervals. Thus, similar to the usual linear least-squares results 
in Appendix A7.2.3, 

will approximate the variance-covariance matrix of the p’s, and o 2 will be estimated by 
& a = s (P)/ n - 

7.2.3 Direct Evaluation of the Derivatives 

We now show that it is also possible to obtain derivatives directly, but additional re¬ 
cursive calculations are needed. To illustrate the method, it is sufficient to consider an 
ARMA(1, 1) process, which can be written in either of the forms as 

e t = w t — </>iP r+1 + 9e t+1 
a t = w, — + 9a t _i 

We have seen in Section 7.1.4, how the two versions of the model may be used in al¬ 
ternation, one providing initial values with which to start off a recursion with the other. 
We assume that a first computation has already been made yielding values of [e t ], of [c/ f ], 
and of [w Q ], [w_ l ], ..., [lv 1 _q\, as in Section 7.1.5, and that [w_q], [w_q_ 1 ], ... and hence 
[o_q], [a_Q_ i ],... are negligible. We now show that a similar dual calculation may be used 
in calculating derivatives. 

Using the notation a f ] to denote the partial derivative d[a t \/dcf>, we obtain 

«f = ~ - K+t] (7-2.3) 

af > = w- 4>wf \1 + 0af\ - [i w ,_ x ] (7.2.4) 

e ( P = + 0e^ + [e t+l ] (7.2.5) 
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Now, 


and 


„( 0 ) 


w. 


( 0 ) 


+ 0a^\ + [a f _j] 


(7.2.6) 


[W,] : 

( 0 ) 

W) : 




w 


( 0 ) 


t = 1,2,... ,n 


(7.2.7) 


[e_ y ] = 0 y = 0,1,..., n (7.2.8) 

Consider equations (7.2.3) and (7.2.4). By setting = 0 in (7.2.3), we can begin a 
back recursion, which using (7.2.7) and (7.2.8) eventually allows us to compute uffj for 
j = 0,1,..., Q — 1. Since a^, a^Q_ v ■■■ can be taken to be zero, we can now use (7.2.4) 
to compute recursively the required derivatives cJf’K In a similar way, (7.2.5) and (7.2.6) 

(Q\ 

can be used to calculate the derivatives a, . 


7.2.4 General Least-Squares Algorithm for the Conditional Model 

An approximation that we have sometimes used with long series is to set starting values for 
the a,' s, and hence for the derivatives in the x t ’s, equal to their unconditional expectations 
of zero and then to proceed directly with the forward recursions. The effect is to introduce 
a transient into both the a, and the x t series, the latter being slower to die out since the x,’s 
depend on the a r ’s. In some instances, where there is an abundance of data (say, 200 or 
more observations), the effect of the approximation can be nullified at the expense of some 
loss of information, by discarding, say, the first 10 calculated values. 

If we adopt the approximation, an interesting general algorithm for this conditional 
model results. The ARMA(p, q ) model can be written as 

a t = 

where w t = S/ d z t , w t = w, — /,< and 


9(B) = 1 - 9 X B - 9jB' - - 9 q B q 

<t>(B)= 1-<M- <PjB j --cl> p BP 

If the first guesses for the parameters ft = (0, 6) are /J 0 = (<p 0 , 0 O ), then 

a t,o = Oo'(B)UB)w, 


and 


where 


da t 

d<t>j 


= u tj = “t-j 


Po 


da, 

39, 


= V, , = V 


t-i 


Pa 


u, = 9 0 1 (B)w t = <p Q 1 (B)a, 0 

v, = -9~ 2 (B)(j) Q (B)u>, = -9~ 1 (B)a, 0 


(7.2.9) 

(7.2.10) 
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The a t ’s, u t ’ s, and v,’s may be calculated recursively, with starting values for a,’ s, s, 
and v t ’s set equal to zero, as follows: 


a t. 0 = 9j, - - •" - fp.O^t-p + 01,O fl /-l,O 


+ ■" T 9qfl a t-q,0 

(7.2.11) 

u t = 01,O u t-l + •■■ + 0q,O u t-q + 

(7.2.12) 

= 0i,o M r—i + + QpflUt-p + a t. o 

(7.2.13) 

V t = + •" + ®q.0 V t-q ~ a t. 0 

(7.2.14) 


Corresponding to (7.2.1), the approximate linear regression equation becomes 


p q 

a ,, o = ~ ^j-0>t-j + Y (0 ' ~ 6 i,o' >v t-i + a t (7.2.15) 

7=1 i= l 

The adjustments are then the regression coefficients of a tQ on the u t _j and the v t _ t . By 
adding the adjustments to the first guesses (7/> {| , 0 O ), a set of “second guesses” are formed 
and these now take the place of (</> o ,0 o ) i n a second iteration, in which new values of 
a t 0 , u t , and v t are computed, until convergence eventually occurs. 


Alternative Form for the Algorithm. The approximate linear expansion (7.2.15) can be 
written in the form 

p q 

a t .o = Y^J - - Yj( 6 i - O hO ) B ‘0-\ B ) a , fi + a, 

7=1 i=i 

= -WB) - + [9(B) - 0 O (B)]0~ 1 (B)a tfi + a t 


that is, 


°r,o = ~4>(B)[0 O 1 ( 5 ) a f,o] + 9(B)[0 Q l (B)a t 0 ] + a, (7.2.16) 

which presents the algorithm in an interesting form. 

Application to an IMA(0, 2, 2) process. To illustrate the calculation with the conditional 
approximation, consider the estimation of least-squares values 0^, 0 2 for Series C using the 
model of order (0, 2, 2): 


w t = (1 — 0\B — 0 2 B 2 )a t 


with w, = V 2 z r , 


a t ,0 - w t + 0i,o a t-l,O + ^2,0 a r-2.0 

v t = ~ a I,Q + 0l,0 v t-l + ®2,0 u t—2 

Using the initial values 0\ 0 = 0.1 and 0 2O = 0.1, the first adjustments to 0\ 0 and 0 2 q 
are found by ‘ ‘regressing’ ’ a, 0 on u r _ l and v t _ 2 . The process is repeated until convergence 
occurs. Successive parameter estimates are shown in Table 7.5. 
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TABLE 7.5 Convergence of Parameter 
Estimates for IMA(0, 2,2) Process 


Iteration 


02 

0 

0.1000 

0.1000 

1 

0.1247 

0.1055 

2 

0.1266 

0.1126 

3 

0.1286 

0.1141 

4 

0.1290 

0.1149 

5 

0.1292 

0.1151 

6 

0.1293 

0.1152 

7 

0.1293 

0.1153 

8 

0.1293 

0.1153 


7.2.5 ARIMA Models Fitted to Series A-F 

In Table 7.6, we summarize the models fitted by the iterative least-squares procedure of 
Sections 7.2.1 and 7.2.2 to Series A-F. The models fitted were identified in Chapter 6 and 
summarized in Tables 6.2 and 6.5. In the case of Series A, C, and D, two possible models 
were identified and subsequently fitted. For Series A and D, the alternative models involve 
the use of a stationary autoregressive operator (1 — <pB) instead of the unit-root operator 
(1 — B). Examination of Table 7.6 shows that in both cases the autoregressive model results 
in a slightly smaller residual variance although the models are very similar. Even though 
a slightly better fit is possible with a stationary model, the IMA(0, 1,1) model might be 


TABLE 7.6 Summary of Models Fitted to Series A-F“ 


Series 

Number of 

Observations 

Fitted Models 

Residual Variance* 

A 

197 

z, - 0.92z,_, = 1.45 + a, - 0.58a, 

0.097 



(±0.04) (±0.08) 




Vz, = a, — 0.70a, 

0.101 



(±0.05) 


B 

369 

Vz, = a, + 0.09a, 

52.2 



(±0.05) 


C 

226 

Vz, - 0.82Vz,_j = a, 

0.018 



(±0.04) 




V 2 z, = a, — 0.13a,— 0.12a,_ 2 

0.019 



(±0.07) (±0.07) 


D 

310 

z, - 0.87z,_! = 1.17 + a, 

0.090 



(±0.03) 




Vz, = a, - 0.06a, 

0.096 



(±0.06) 


E 

100 

z, = 14.35 ± 1.42z,_! - 0.73z,_ 2 + a, 

227.8 



(±0.07) (±0.07) 




z, = 11.31 + 1.57z,_! - 1.02z,_ 2 + 0.21z,_ 3 + a, 

218.1 



(±0.10) (±0.15) ~ (±0.10) 


F 

70 

z, = 58.87 - 0.342z,_, + 0.19z,_ 2 + a, 

112.7 



(±0.12) (±0.12) ” 



0 The values (±) under each estimate denote the standard errors of those estimates. 
* Obtained from S(<j>, 0)/n. 
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preferable in these cases on the grounds that unlike the stationary model, it does not assume 
that the series has a fixed mean. This is especially important in predicting future values 
of the series. For if the level does change, a model with d > 0 will continue to track it, 
whereas a model for which d = 0 will be tied to a mean level that may have become out of 
date. It must be noted, however, that for Series D formal unit root testing to be discussed 
further in Section 10.1 does not support the need for differencing and suggests a preference 
for the stationary AR(1) model. Also, unit root testing for Series C indicates a preference 
for the ARIMA(1, 1, 0) model over a model in terms of second differences. Unit root 
testing for Series A within the ARMA(1, 1) model, though, does not reject the need for the 
nonstationary operator (1 — B) for the autoregressive part. 

The limits under the coefficients in Table 7.6 represent the standard errors of the estimates 
obtained from the covariance matrix (XaXa) -1 it~, as described in Section 7.2.1. Note that 
the estimate </> 3 in the AR(3) model, fitted to the sunspot Series E, is 2.1 times its standard 
error, indicating that a marginally better fit is obtained by the third-order autoregressive 
process, as compared with the second-order autoregressive process. This is in agreement 
with a conclusion reached by Moran (1954). 

Parameter Estimation Using R. Parameter estimation for ARIMA models based on the 
methods described above is available in the R software package. The relevant tools in¬ 
clude the arima() command in the stats package and the sarima() command in the astsa 
package. Details of the commands are obtained by typing help(arima) and help(sarima) 
in R. Using the arimaQ command, the order of the model is specified using the argument 
order=c(p,d,q), and the estimation method is specified by method=c("CSS") for condi¬ 
tional least-squares and method=c("ML") for the full maximum likelihood method. The 
sarima() fits the ARIMA(p, d, q) model to a series Z by maximum likelihood using the 
command sarima(z,p,d,q). 

For illustration, we first use the arima() routine in the stats package to estimate the 
parameters the ARIMA(3, 0, 0) model for the sunspot data in Series E. The relevant 
command and a partial model output are provided below. 

> arima(ts(seriesE),order=c(3,0,0),method=c("CSS")) 

Coefficients: 

arl ar2 ar3 intercept 

1.5519 -1.0069 0.2076 46.7513 

s.e. 0.0980 0.1540 0.0981 5.9932 

sigma"'2 estimated as 219.3: log-likelihood = -411.42, aic = NA 

We see that the estimates of the autoregressive parameters are very close to the values 
provided in Table 7.6. However, using this routine, the intercept reported in the output is 
the mean of the series, so that the constant term in the model needs to be calculated as 
0 0 = /}(1 — <p l — <fi 2 ~ (fo)- This gives an estimate for the constant of 11.57. 

The commands and a partial output from performing the analysis using sarima() are as 
follows: 

> library(astsa) 

> sarima(ts(seriesE),3,0,0) 
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Coefficients : 

arl ar2 ar3 xmean 

1.5531 -1.0018 0.2063 48.4443 

S.e. 0.0981 0.1544 0.0989 6.0706 

sigma~2 estimated as 218.2: log-likelihood=-412.49, aic 834.99 
$AIC: [1] 6.465354, $AICc: [1] 6.491737, $BIC: [1] 5.569561 

The results are close to the earlier ones. The sarima() command has an advantage in that 
model diagnostics of the type discussed in Chapter 8 below are provided automatically 
as part of the output (see, e.g., Figures 8.2 and 8.3). This allows the user to efficiently 
evaluate the adequacy of a fitted model and make comparisons between alternative models. 
For example, by fitting both the AR(2) and the AR(3) models to the sunspot series, it is 
readily seen that the AR(3) model provides a better fit to the data. Moreover, the fit can be 
improved by using a square root or log transformation of the series, although a Q-Q plot 
still indicates a departure from normality of the standardized residuals. 


7.2.6 Large-Sample Information Matrices and Covariance Estimates 

In this section, we examine in more detail the information matrix and the covariance matrix 
of the parameter estimates. Denote by X = [U : V], the nX (p + q ) matrix of the time lagged 
u' t s and v' t s defined in (7.2.13) and (7.2.14), when the elements of /J 0 are the true values of 
the parameters, for a sample size n sufficiently large for end effects to be ignored. Then, 
since x tj = —d[a t ]/dpj and using (7.1.20), 



\d 2 S(P)] 

= - — E 

y d[a t ] d[a t ] 

dPidpj \ 


[h W d h \ 



the information matrix for ($, 9) for the mixed ARMA model is 
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(7.2.17) 
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Yuui 2 - 9) 

YuuiP ~ 9) 

Yvvil - 1 ) 

Yuuip - 2 ) 

Yvvi°) 

(7.2.18) 
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where y uu (k ) and y uv (k) are the autocovariances for the u t ’s and the v t ’s, and y uv (k ) are the 
cross-covariances defined by 


r UIJ (k) = y vu (~k ) = E[u t v t+k ] = E[v t u,_ k ] 

The large-sample covariance matrix for the maximum likelihood estimates may be obtained 
using 


V(<M) ^ i _1 (<M) 

Estimates of 1(0,0) and hence of V(0,0) may be obtained by evaluating the ufs and 
v t ’s with P i} = ft and omitting the expectation sign in (7.2.17) leading to K(0, 0) = 
{X'zXp)-^ <j 7 -, or by substituting standard sample estimates of the autocovariances and 
cross-covariances in (7.2.18). Theoretical large-sample results can be obtained by noticing 
that, with the elements of fi {) equal to the true values of the parameters, equations (7.2.13) 
and (7.2.14) imply that the derived series u t and v t follow autoregressive processes defined 

by 


4>(B)u, = a t 9(B)v t = —a t 

It follows that the autocovariances that appear in (7.2.18) are those for pure autoregressive 
processes, and the cross-covariances are the negative of those between two such processes 
generated by the same a,’ s. 

We illustrate the use of this result with a few examples. 

Covariance Matrix of Parameter Estimates for AR(p) and MA(q) Processes. Let r (0) 
be the p X p autocovariance matrix of p successive observations from an AR(p) process 
with parameters <p r = (0 ], 02,.... <p p ). Then, using (7.2.18), the p X p covariance matrix of 
the estimates 0 is given by 


Y(0) ~ «- 1 ^r; 1 (0) (7.2.19) 

Let 1^(0) be the q X q autocovariance matrix of q successive observations from an AR(g) 
process with parameters 6' = (0[, 0 2 ,..., 0 q ). Then, using (7.2.18), the qX q covariance 
matrix of the estimates 0 in an MA(g) model is 

V(0) * (7.2.20) 

Covariances for the Zeros of an ARMA Process. It is occasionally useful to parameterize 
an ARMA process in terms of the zeros of 0(U) and 9(B). In this case, a particularly simple 
form is obtained for the covariance matrix of the parameter estimates. 

Consider the ARMA(p, q) process parameterized in terms of its zeros (assumed to be 
real and distinct), so that 


p 

[Id - G i B)a, t 


i=l 



j= 1 
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a^fld-GiB^d-HjB) 'i 

i =i 


i= 1 


The derivatives of the a t 's are then such that 


da, , 

u u = ~qq. = a r-\ 

v ,, = = -(1 - H:B)~ l a t , 

t,j dH, J ,_1 


Hence, using (7.2.18), for large samples, the information matrix for the roots is such that 


n -1 I(G, H) 


(1-G?)“‘ 

(l-GjGz)- 1 • 

•• (1 -G x G p y l |—(1 — G l H 1 )~ 1 ■■ 

• -(1 - Gl H q T l 

(i -g x g p )-' 

(l-G^r 1 • 

•• (1 -G 2 p y' -(1 - G p H x y l •• 

• -a - G P H q r l 

-(i-G^r 1 

—(i — G 2 H x y x ■ 

••-d-G^)- 1 ! (i-^f)” 1 

(1 -^hjt 1 

-a - Gl H q r l 

-(1 -G 2 H q r l ■ 

•• -d-G^r 1 ■■ 

■■ (1 -H 2 )-' 

4 


(7.2.21) 


Examples: For an AR(2) process (1 — G X B)( 1 — G 2 B)w t = a t , we have 


V(G 1 ,G 2 ): 


,-l 


n-i 


(1 -Gf) (1 -G x G 2 r 


(1 -G x G 2 r l (l-G;)- 1 


1 1 - G x G 2 
n (Gj - G 2 ) 2 


2\ ~1 


(1 - Gf)(l - G,G 2 ) -(1 - Gf)(l - Gp 
-(1 - Gp(l - G 2 2 ) (1 - G 2 2 )(l - G { G 2 ) 


(7.2.22) 


Exactly parallel results will be obtained for a second-order moving average process. 

Similarly, for the ARMA(1,1) process (1 — <pB)w t = (1 — 6B)a t , on setting <fi = G x and 
6 = iTj in (7.2.21), we obtain 


V(<M) 


_! I" (i - </> 2 ) 1 -(i - 1" 

-(i - 4 >er l (i - e 2 y l 

l 1-4,6 [Cl - 0 2 )C1 - 0^) Cl - 0 2 )(l - 6> 2 )' 
n ( 4 > - 1) 2 (1 - </> 2 )d - e 2 ) (1 - e 2 )(i - 4,6) 


(7.2.23) 


The results for these two processes illustrate a duality property between the information 
matrices for the autoregressive model and the general ARMA(p, q) model. Namely, suppose 
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that the information matrix for parameters (G, H) of the ARMA(p, q) model 

P 9 

J](l - G,B)w t = J](l - HjB)a, 

<=i j =i 

is denoted as I{G, H|(p, g)}, and suppose, correspondingly, that the information matrix for 
the parameters (G, H) in the pure AR(p + q) model 

p 9 

Y[(l - - HjB)w, = a t 

1=1 j =1 


is denoted as 


I{G, H|(p + q, 0)} 


Igg ^gh 

GH IhH 


where the matrix is partitioned after the pth row and column. Then, for moderate and large 
samples, we can see directly from (7.2.21) that 


I{G, H|(p, q)} ~ I{G, -H\(p + q,0)} = 


Igg ~^gh 
—I'gh IhH 


(7.2.24) 


Hence, since for moderate and large samples, the inverse of the information matrix provides 
a close approximation to the covariance matrix Y(G, H) of the parameter estimates, we 
have, correspondingly, 

\{G,H\(p,q)} ~ V{G,-H|(p + 9 ,0)} (7.2.25) 


7.3 SOME ESTIMATION RESULTS FOR SPECIFIC MODELS 

In Appendices A7.3, A7.4, and A7.5, some estimation results for special cases are derived. 
These, and results obtained earlier in this chapter, are summarized here for reference. 


7.3.1 Autoregressive Processes 

It is possible to obtain estimates of the parameters of a pure autoregressive process by 
solving certain linear equations. We show in Appendix A7.4: 

1. How exact least-squares estimates may be obtained by solving a linear system of 
equations (see also Section 7.5.3). 

2. How, by slight modification of the coefficients in these equations, a close approxi¬ 
mation to the exact maximum likelihood equations may be obtained. 

3. How conditional least-squares estimates, as defined in Section 7.1.3, may be obtained 
by solving a system of linear equations of the form of the standard linear regression 
model normal equations. 
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4. How estimates that are approximations to the least-squares estimates and to the 
maximum likelihood estimates may be obtained using the estimated autocorrelations 
as coefficients in the linear Yule-Walker equations. 


The estimates obtained in item 1 are, of course, identical with those given by direct 
minimization of Y($), as described in general terms in Section 7.2. The estimates in 4 are 
the well-known approximations due to Yule and Walker. They are useful as first estimates 
at the identification stage but can differ appreciably from estimates 1, 2, or 3, in some cases. 
For instance, differences can occur for an AR(2) model if the parameter estimates and 
(j>2 are highly correlated, as is the case for the AR(2) model fitted to Series E in Table 7.6. 


Yule-Walker Estimates. The Yule-Walker estimates (6.3.6) are 

0 = R-'r 


where 


1 

>'\ ■ 

" r P~ 1 


r \ 
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1 • 
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In particular, the estimates for the AR(1) and the AR(2) processes are 
AR(1) : $y=ri 

„ r,(l — r 2 ) ~ r 2 ~ 

AR(2) : 4>1 = J -02 = - T (7.3.2) 

l-r~ 1 ~r\ 

It is shown in Appendix A7.4 that an approximation to S(<p) is provided by 


so that 


S(<j>) = £ w 2 t { 1 - r'fa 

t= t 


a 2 = 


S(0) 


= c 0 ( 1 - r'0) 


(7.3.3) 


(7.3.4) 


where c 0 is the sample variance of the w,’s. A parallel expression relates a 2 and y 0 , the 
theoretical variance of the w t ’s [see (3.2.8)], namely, 

a l = Yatt- P'0) 


where the elements of p and of (f> are the theoretical values. Thus, from (7.2.19) and 
Appendix A7.5, the covariance matrix for the estimates (f) is 

V(<p) ~ n -1 (7~r -1 = n _1 (l - p>)P _1 (7.3.5) 

where F and P = (1 /y (l )F are the autocovariance and autocorrelation matrices of p succes¬ 
sive values of the AR(/>) process. 
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In particular, for the AR(1) and AR(2) processes, we find that 
AR(1) : VW)^n-\\-4> 2 ) 


AR(2) : V(^J 2 )~n 


-l 


1-02 - 01(1 + 02 )' 
-0!(l + 0 2 ) 1-0? 


(7.3.6) 

(7.3.7) 


Estimates of the variances and covariances are obtained by substituting estimates of the 
parameters in (7.3.5). Thus, 

V($) = n~\l - r'<p)R- 1 (7.3.8) 


Using (7.3.7) it is readily shown that the correlation between the estimates of the AR(2) 
parameters is approximately equal to —p x . This implies, in particular, that a large lag- 
1 correlation in the series can give rise to unstable estimates, which may explain the 
differences between the Yule-Walker and the least squares estimates noted above. 


7.3.2 Moving Average Processes 

Maximum likelihood estimates 0 for moving average processes may, in simple cases, be 
obtained graphically, as illustrated in Section 7.1.6, or more generally, by the iterative 
calculation described in Section 7.2.1. From (7.2.20), it follows that for moderate and large 
samples, the covariance matrix for the estimates of the parameters of a gth-order moving 
average process is of the same form as the corresponding matrix for an autoregressive pro¬ 
cess of the same order. Thus, for the MA(1) and MA(2) processes, we find, corresponding 
to (7.3.6) and (7.3.7) 

MA(1) : Y(9)~n~ l (l-9 2 ) (7.3.9) 


MA(2) : 


\{e x ,e 2 )-n~ l 


1 - 0 2 _ - 6(1 + 0 2 ) 

- e x (\ + e 2 ) 1-02 


(7.3.10) 


7.3.3 Mixed Processes 

Maximum likelihood estimates (0, 0) for mixed processes, as for moving average processes, 
may be obtained graphically in simple cases, and more generally, by iterative calculation. 
For moderate and large samples, the covariance matrix may be obtained by evaluating and 
inverting the information matrix (7.2.18). In the important special case of the ARMA(1, 1) 
process 

(1 — <pB)w t = (1 — 9B)a t 


we obtain, as in (7.2.23), 

(1 - </> 2 )( 1 - 00) (1 - cjr)( 1 - 0 2 )] 

(l-0 2 )(l-0 2 ) (1-0 2 ) (1 -00)J (7 - 3 - U) 

It is noted that when <fi = 9, the variances of 0 and 0 are infinite. This is to be expected, 
for in this case the factor (1 — 4>B) = (1 — OB) cancels on both sides of the model, which 
becomes 


Y (</>, 9) ~ n 


-i 1 - 00 
( 0 - 0) 2 


w f 
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This is a particular case of parameter redundancy, which we discuss further in Section 
7.3.5. 


7.3.4 Separation of Linear and Nonlinear Components in Estimation 

It is occasionally of interest to make an analysis in which the estimation of the parameters of 
the mixed model is separated into its basic linear and nonlinear parts. Consider the general 
mixed model 4>{B)w t = 9{B)a t , which we write as a, = <p(B)9~ 1 (B)w t , or 


where 


that is. 


a, = <KB)(e t \0) (7.3.12) 

(e,\0) = 9-\B)w t 

w, = 9(B)(e t \0 ) (7.3.13) 


For any given set of9's, the e/s may be calculated recursively from (7.3.13), which may 
be written as 


E t — W t + 9\£ t _\ + 9 2 e t -2 + "■ + Oq £ t-q 

The recursion may be started by setting unknown e/s equal to zero. Having calculated 
the e/s, the conditional estimates <p e may readily be obtained. These are the estimated 
autoregressive parameters in the linear model (7.3.12), which may be written as 

a t = £ t~ 4>i £ t-\ ~ <t>i £ t-2 - 4> P £ t- v (7.3.14) 

As discussed in Section 7.3.1, the least-squares estimates of the autoregressive param¬ 
eters may be found by direct solution of a set of linear equations. In simple cases, we can 
examine the behavior of S(<p e , 6) and find its minimum by computing S(<p 0 ,6) on a grid 
of 0 values and plotting contours. 

Example Using Series C. One possible model for Series C considered earlier is the 
ARIMA(1, 1, 0) model (1 — 4>B)w t = a t with w t = Vz r and E[w,\ = 0. Consider now 
the somewhat more elaborate model (1 — cf>B)w, = (1 — 0 ] B — 9 2 B 2 )a t . Following the ar¬ 
gument given above, the process may be thought of as resulting from a combination of the 
nonlinear model e t = w t + 9i£,_\ 4- 9 2 e t _ 2 and the linear model a t = e t — 4>e t _ l . 

For each choice of the nonlinear parameters 8 = (9i,9 2 ) within the invertibility region, 
a set of e/s was calculated recursively. Using the Yule-Walker approximation, an estimate 
(j) 0 = /-| (e) could now be obtained together with 

n 

S($ 0 ,O)* 

i=i 

This sum of squares was plotted for a grid of values of 9 l and 9 2 and its contours are shown 
in Figure 7.6. We see that a minimum close to 9\ = 9 2 = 0 is indicated, at which point 
/•| (t ) = 0.805. Thus, within the whole class of models of order (1, 1, 2), the simple (1, 1, 
0 ) model (1 — 0.8 B)Vz t = a t is confirmed to provide an adequate representation. 
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0 , 


FIGURE 7.6 Counters of S(<p g ,0) for Series C plotted over the admissible parameter space for 
the 9’ s. 


7.3.5 Parameter Redundancy 

The model </>(5)u), = 9(B)a t is identical to the model 

(1 - aB)(j)(B)w t = (1 - aB)6{B)a, 

in which both autoregressive and moving average operators are multiplied by the same 
factor, 1 — aB. Serious difficulties in the estimation procedure will arise if a model is fitted 
that contains a redundant factor. Therefore, care is needed in avoiding the situation where 
redundant or near-redundant common factors occur. The existence of redundancy is not 
always obvious. For example, one can see the common factor in the ARMA(2, 1) model 

(1 - 1.35 + 0AB 2 )u>, = (1 - 0.55)a r 
only after factoring the left-hand side to obtain 

(1 - 0.55)(1 - 0.8 B)w t = (1 - 0.5 B)a, 
that is, (1 — 0.8 B)w t = ci t . 

In practice, it is not just exact cancellation that causes difficulties, but also near- 
cancellation. For example, suppose that the true model was 

(1 - 0.45)(1 - 0.8 B)w t = (1 - 0.5 B)a, (7.3.15) 

If an attempt was made to fit this model as ARMA(2,1), extreme instability in the parameter 
estimates could arise because of near-cancellation of the factors (1 — 0.45) and (1 — 0.55), 
on the left- and right-hand sides. In this case, combinations of parameter values yielding 
similar [a,]’s and so similar likelihoods can be found, and a change of parameter value on 
the left can be nearly compensated by a suitable change on the right. The sum-of-squares 
contour surfaces in the three-dimensional parameter space will thus approach obliquely 
oriented cylinders, and a line of ‘ ‘near least-squares’ ’ solutions rather than a clearly defined 
point minimum will be found. 

From a slightly different viewpoint, we can write the model (7.3.15) in terms of an 
infinite autoregressive operator. Making the necessary expansion, we find that 

(1 - 0.7005 - 0.0305 2 - 0.0155 3 - 0.0085 4 - -)w, = a, 
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Thus, very nearly, the model is 


(\-0.1B)w, = a, (7.3.16) 

The instability of the estimates, obtained by attempting to fit an ARMA(2, 1) model, would 
occur because we would be trying to fit three parameters in a situation that could almost be 
represented by one. 

A principal reason for going through the identification procedure prior to fitting the 
model is to avoid difficulties arising from parameter redundancy and to achieve parsimony 
in parameterization. 

Redundancy in the ARMA(1,1) Model. The simplest model where the possibility occurs 
for direct cancellation of factors is the ARMA(1, 1) process: 

(1 — 4>B)w t = (1 -9B)a, 

In particular, if <p = 9, then whatever common value they have, w t = a t , so that w, is 
generated by a white noise process. The data then cannot supply information about the 
common parameter, and using (7.3.11), (p and 0 have infinite variances. Furthermore, 
whatever the values of (p and 9, S(<p,Q) must be constant on the line <p = 9. This is 
illustrated in Figure 7.7, which shows a sum-of-squares plot for the data of Series A. 
However, for these data, the least-squares values cp = 0.92 and 9 = 0.58 correspond to 
a point that is not particularly close to the line cp = 0, and no difficulties occur in the 
estimation of these parameters. 

In practice, if the identification technique we have recommended is adopted, these 
difficulties will be avoided. An ARMA(1, 1) process in which cp is very nearly equal to 9 
will normally be identified as white noise, or if the difference is nonnegligible, as an AR( 1) 
or MA(1) process with a single small coefficient. 

In summary: 

1. We should avoid mixed models containing near common factors, and we should be 
alert to the difficulties that can result. 
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FIGURE 7.7 Sum-of-squares plot for Series A. 
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2. We will automatically avoid such models if we use identification and estimation 
procedures intelligently. 


7.4 LIKELIHOOD FUNCTION BASED ON THE STATE-SPACE MODEL 

In Section 5.5, we introduced the state-space model formulation of the ARMA process along 
with Kalman filtering and described its use for prediction. This approach also provides a 
convenient method to evaluate the exact likelihood function for an ARMA model. The use 
of this approach has been suggested by Jones (1980), Gardner et al. (1980), and others. 
The state-space model form of the ARMA(p, q) model given in Section 5.5 is 

Y, = + ¥ a, and w, = HE, (7.4.1) 

where Y' f = (w t , w t ( 1),..., w t (r — 1)), r = max(p, q 4- 1), H = (1,0,.... 0), 

"0 1 0-0 

0 0 1-0 

d>= ; : : - : 

0 0 0 ... 1 

</>,- 4> r -1 .01 

and ^ = (1, iffy ,..., |). The Kalman filter equations (5.5.6)—(5.5.9) provide one-step- 

ahead forecasts Y ( | f-1 = E[Y t \ w t _ j,..., w{\ of the state vector Y t and the error covariance 
matrix V,| ( _j = E[(Y, —Y t \ t _{)(Y t — Y r | r _]/]. Specifically, forthe state-space form of the 
ARMA(p, q) model, these recursive equations are 

= with K, = V,| < _ 1 H , [HV (|f _ 1 H']- 1 (7.4.2) 

where w t | r-1 = Hy,| ( _|, and 

*t\t-1 = ^r-m-t (7.4.3) 

with 

Vf,/ = [I — K,H]Y (|( _j (7.4.4) 

for t = 1,2,..., n. In particular, then, the first component of the forecast vector is 
w t | r _j = HT f | ( _] = E[w t \w l _ l ,... = w t — w t | r _j is the one-step innovation, 

and the element o\> t = IIV r ( , IP = E[(w t — ) 2 ] is the one-step forecast error vari¬ 

ance. 

To obtain the exact likelihood function of the vector of n observations w' = 
(uq, w 2 ,..., w n ) using the above results, we note that the joint distribution of w can 
be factored as 

n 

p( w|</>, 0, o 2 a ) = Y\p(w t \w t _ l ,...,w l \<j), e,c 2 a ) 

t=1 


(7.4.5) 
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where p(w t \w,_i, ..., W\, (f>, 6, o 2 ) denotes the conditional distribution of w, given 
iv,_i ,, w t . Under normality of a t , this conditional distribution is normal with conditional 

mean = E[w,\w t _\, ., u>\] and conditional variance a 2 u, = E[(w, — 

Hence, the joint distribution of w can be conveniently expressed as 


n 

p{ w|0, 0, a]) = n(2^ f r 1/2 exp 

t= l 


1 y K ~ ^r|r-l) 2 

2 ° 2 a hi V < 


(7.4.6) 


where the quantities and o 2 v t are easily determined recursively from the Kalman 

filter procedure. The initial values needed to start the Kalman filter recursions are given 
by y 0 | 0 = 0, an /--dimensional vector of zeros, and V 0 | 0 = cov[Y 0 ]. The elements of V 0 | 0 
can readily be determined as a function of the autocovariances y k and the weights i// /{ of 
the ARMA(p, q) process w t , making use of the relation w t+ j = w t (j ) + Vk a t+j-k 
from Chapter 5. See Jones (1980) for further details. For example, in the case of an 
ARMA(1, 1) model for w t , we have Y' t = ( w t , w,( 1)), so 


V 0 |o = cov[Y 0 j = 


-1 

_l 

2 

° a 2 V o 

° 2 Vl 

r i ro ~ 

= < 

°a 2 yi 

vf'ro ~ 1 


It also is generally the case that the one-step-ahead forecasts w t \ t _i and the corresponding 
error variances <y 2 v t rather quickly approach their steady-state forms, in which case the 
Kalman filter calculations at some stage (beyond time t 0 , say) could be switched to the 
simpler form = £f =1 - Z‘! = i and a l v t = var [°r|r-il = for 

t > f 0 , where = w t — w t | f _j. For example, refer to Gardner et al. (1980) for further 
details. On comparison of (7.4.6) with expressions given earlier in (7.1.5) and (7.1.6), and 
also (A7.3.11) and (A7.3.13), the unconditional sum-of-squares function can be represented 
in two equivalent forms as 


A(0,0) = XK] 2 + < ST'e* 

t= 1 


n 


z 


a 


2 

/|r-l 


v , 


where a^ t _ l = w t — w t \ t _ x , and also 1 = |£2||D| = fl/Li v t- 


Innovations Method. The likelihood function expressed in the form of (7.4.6) is generally 
referred to as the innovations form, and the quantities a t \ t _ l = w t — w t = 1 , 
are the (finite-sample) innovations. Calculation of the likelihood function in this form, 
based on the state-space representation of the ARMA process and associated Kalman 
filtering algorithms, has been proposed by many authors including Gardner et al. (1980), 
Harvey and Phillips (1979), and Jones (1980). The innovations form of the likelihood can 
also be obtained without directly using the state-space representation through the use of 
an “innovations algorithm” (e.g., see Ansley, 1979; Brockwell and Davis, 1991). This 
method essentially involves a Cholesky decomposition of an n X n band covariance matrix 
of the derived MA(q) process: 

w[ = w t - - f p w,_ p = a,~ e l a t _ l - d q a,_ q 

More specifically, using the notation of Appendix A7.3, we write the ARMA model 
relations for n observations as L^w = L 0 a + Fe s , where a' = (oj, a 0 ,, a n ) and e' f = 




244 PARAMETER ESTIMATION 


(w x _ p ...., w 0 , a x _ q ,..., a o) is the (p + ^-dimensional vector of pre-sample values. Then, 
the covariance matrix of the vector of derived variables L^w is 

r,,/ = cov[L 0 wl = covlL.a + FeJ = a 2 a (L e L' d + FQF') (7.4.7) 

which is a band matrix. That is, F„,/ is a matrix with nonzero elements only in a band 
about the main diagonal of maximum bandwidth m = max(p, q), and of bandwidth q after 
the first m rows since co v[w' r w' t+ . ] = 0 for j > q. The innovations algorithm obtains the 
(square-root-free) Cholesky decomposition of the band matrix + FQF' as GDG\ 
where G is a lower triangular band matrix with bandwidth corresponding to that of V ujt 
and with ones on the diagonal, and D is a diagonal matrix with positive diagonal elements 
v t ,t = 1 ,..., n. Hence, cov[w] = o 2 a L^ 1 GDG , L^ ) and the quadratic form in the exponent 
of the likelihood function (7.4.6) is 


w'{cov[w]} *w = Jjw / (L 0 1 GDG / L^ *) 1 w 

®a 


= = 

(J Z 

a 


a t= 1 


,2 

1 1- 


v t 


(7.4.8) 


where e = G 'L^w = (flqo, a 2 |t-> a n\n-\)' i s ^ le vector of innovations, which are 
computed recursively from Ge = L^w. Thus, the innovations can be obtained re¬ 
cursively as ajio = w x , a 2 \i = w 2 - <t>iw x + 6> u ai| 0 ,..., a m]m _ x = w m - 4h w m ~, + 

S/=l ^i,m— l®m—i\m— 1—1 5 ttnd 




W, 


X ( ib w t-i + X e i,t-i a t-i\t-i-i 


i -1 


/=! 


(7.4.9) 


for t > m, where the tth row of the matrix G has the form 


to.o, —0 qt _i ,.... —Oij-i, 1,0,... ,0] 

with the 1 in the fth (i.e., diagonal) position. In addition, the coefficients 6 it _j in (7.4.9) 
and the diagonal (variance) elements v, are obtained recursively through the Cholesky 
decomposition procedure. In particular, the v t are given by the recursion 

yn(n/) ~ 

V, = -r-y 0 2 _ x v t -j for t > m (7.4.10) 

a o j=i 

where y 0 (w')/a 2 a = var[utJ]/<7^ = 1 + d j- 

The “innovations” state-space approach to evaluating the exact likelihood function has 
also been shown to be quite useful in dealing with estimation problems for ARMA models 
when the series has missing values; see, for example, Jones (1980), Harvey and Pierse 
(1984), and Wincek and Reinsel (1986). 

The exact likelihood function calculated using the Kalman filtering approach can be 
maximized using numerical optimization algorithms. These typically require the first partial 
derivatives of the log-likelihood with respect to the unknown parameters, and it is often 
beneficial to use analytical derivatives. From the form of the likelihood in (7.4.6), it is 
seen that this involves obtaining partial derivatives of the one-step predictions w t | ( _ 1 and 
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of the error variances for each t = 1,.... n. Wincek and Reinsel (1986) show how the 
exact derivatives of a t \ t -\ = w, — w t \t-i anc * a l v t = var [°r|r-i] with respect to the model 
parameters 0, 6, and can be obtained recursively through differentiation of the updating 
and prediction equations. This in turn leads to an explicit form of iterative calculations for 
the maximum likelihood estimation associated with the likelihood (7.4.6), similar to the 
nonlinear least-squares procedures detailed in Section 7.2. 


7.5 ESTIMATION USING BAYES’ THEOREM 
7.5.1 Bayes’ Theorem 

In this section, we again use the symbol £, to represent a general vector of parameters. Bayes’ 
theorem tells us that if p(%) is the probability distribution for £, prior to the collection of the 
data, then p(£, |z), the distribution of £, posterior to the data z, is obtained by combining the 
prior distribution p(^) and the likelihood L(<*|z) in the following way: 


pit |z) 


mum 

J pi^Ut^dt 


(7.5.1) 


The denominator merely ensures that p{£, |z) integrates to 1. The important part of the 
expression is the numerator, from which we see that the posterior distribution is proportional 
to the prior distribution multiplied by the likelihood. Savage (1962) showed that prior and 
posterior probabilities can be interpreted as subjective probabilities. In particular, often 
before the data are available, we have very little knowledge about and we would be 
prepared to agree that over the relevant region, it would have appeared a priori just as likely 
that £, had one value as another. In this case, p(f) could be taken as locally uniform, and 
hence p{£, |z) would be proportional to the likelihood. 

It should be noted that for this argument to hold, it is not necessary for the prior density 
of <* to be uniform over its entire range (which for some parameters could be infinite). By 
requiring that it be locally uniform, we mean that it be approximately uniform in the region 
in which the likelihood is appreciable and that it does not take an overwhelmingly large 
value outside that region. 

Thus, if c were the weight of a chair, we could certainly say a priori that it weighed more 
than an ounce and less than a ton. It is also likely that when we obtained an observation z by 
weighing the chair on a weighing machine, which had an error standard deviation o, we 
could honestly say that we would have been equally happy with a priori values in the range 
z ±3o. The exception would be if the weighing machine said that an apparently heavy chair 
weighed, say, 10 ounces. In this case, the likelihood and the prior would be incompatible, 
and we should not, of course, use Bayes’ theorem to combine them but would check the 
weighing machine and, if this turned out to be accurate, inspect the chair more closely. 

There is, of course, some arbitrariness in this idea. Suppose that we assumed the 
prior distribution of £ to be locally uniform. This then implies that the distribution of 
any linear function of £ is also locally uniform. However, the prior distribution of some 
nonlinear transformation a = a(f ) (such as a = log c) could not be exactly locally uniform. 
This arbitrariness will usually have very little effect if we are able to obtain fairly precise 
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estimates of £. We will then be considering c only over a small range, and over such a 
range the transformation from c to, say, log c would often be very nearly linear. 

Jeffreys (1961) has argued that it is best to choose the metric a(£) so that Fisher’s 
measure of information I a = —E[d 2 l/da 2 ] is independent of the value of a, and hence 
of £. This is equivalent to choosing a(£) so that the limiting variance of its maximum 
likelihood estimate is independent of £ and is achieved by choosing the prior distribution 
of £ to be proportional to 

Jeffreys justified this choice of prior on the basis of its invariance to the parameterization 
employed. Specifically, with this choice, the posterior distributions for a(£) and fore, where 
a(£) and £ are connected by a one-to-one transformation, are such that p(£ |z) = /;(«|z) 
da/d£. The same result may be obtained (Box and Tiao, 1973) by the following argument. 
If for large samples, the expected likelihood function for a(£) approaches a normal curve, 
then the mean and variance of the curve summarize the information to be expected from the 
data. Suppose, now, that a transformation a(£) can be found in which the approximating 
normal curve has nearly constant variance whatever the true values of the parameter. Then, 
in this parameterization, the only information in prospect from the data is conveyed by the 
location of the expected likelihood function. To say that we know essentially nothing a 
priori relative to this prospective observational information is to say that we regard different 
locations of a as equally likely a priori. Equivalently, we say that a should be taken as 
locally uniform. 

The generalization of Jeffreys’ rule to deal with several parameters is that the joint prior 
distribution of parameters £ be taken proportional to 



J7 

d 2 i 

— ih 

[dt,dfj\ 


1/2 


(7.5.2) 


It has been urged (e.g., Jenkins, 1964) that the likelihood itself is best considered and plotted 
in that metric a for which I a is independent of a. If this is done, it will be noted that the 
likelihood function and the posterior density function with uniform prior are proportional. 


7.5.2 Bayesian Estimation of Parameters 


We now consider the estimation of the parameters in an ARIMA model from a Bayesian 
point of view. It is shown in Appendix A7.3 that the exact likelihood of a time series z of 
length N = n + d from an ARIMA(p, d, q) process is of the form 


L((p, 0|z) = o a 0) exp 


S((p,0) 

2(T a 


(7.5.3) 


where 

n 

S( ( />,0)=Y J [a, Iw.^ef + tej'Q-'feJ (7-5.4) 

t= t 


If we have no prior information about o a , (f>, or 0, and since information about o a would 
supply no information about (f> and 0, it is sensible, following Jeffreys, to employ a prior 
distribution for <p , 0, and o a of the form 

P (<p, 0 ,o a )K m, 0)1 l ' 2 o- a i 
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It follows that the posterior distribution is 


p((p. 6, (7 u |z) cx ct o ( " +1) |I(0, 0)| l/2 f(<p, 6) exp 


S(4>.0) 

2o 2 


(7.5.5) 


If we now integrate (7.5.5) from zero to infinity with respect to a a , we obtain the exact 
joint posterior distribution of the parameters <p and 6 as 


p(<f>, e\z) « |I(0,0)| 1/2 /(0 .&)}-”■ 


—n/2 


(7.5.6) 


7.5.3 Autoregressive Processes 

If z, follows an ARIMA(p, d, 0) process, then w, = \ 7cl z t follows a pure AR (p) process. It 
is shown in Appendix A7.4 that for such a process, the factors |I($)| 1 / 2 and / (0), which in 
any case are dominated by the term in S (0), essentially cancel. This yields the remarkably 
simple result that given the assumptions, the parameters (p of the AR(p) process in w, have 
the posterior distribution 

p((P\z)oc {sm ~ n/2 (7.5.7) 

By this argument, then, the sum-of-squares contours, which are approximate likelihood 
contours, are, when nothing is known a priori, also contours of posterior probability. 


Joint Distribution of the Autoregressive Parameters. It is shown in Appendix A7.4 that 
for the pure AR process, the least-squares estimates of the (p’s that minimize S((p) = <p' u D<p u 
are given by 


where <p' u = (1, <p'), 


0 = D;'d 


D n 


D 22 

D 23 ' 

" D 2.p+l 

Dn 

D ,= 

D 23 

D 33 ‘ 

" D 3 ,p+l 

d \,p+ i_ 


D 2,p +1 

D 3,p +1 ' 

" D p+l,p+l 


(7.5.8) 


and 


D = 


Dn|—d' 

-d | I),, 


(7.5.9) 


D ij = D j, = + w i+l w j+l + ••• + w ))+w • (7.5.10) 


It follows that 


S{<P) = vs 2 a + {<p- 0)'D p (<p - 0) 


(7.5.11) 
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where 


and 


2 sw) 

S a = - v = n — p 


S(,4» = <P'U<f>u = D 11 - ct>'D p 4> = d u - d'D;‘d 


Thus, we can write 


p((/)\z) cx 


1 + 


(4> - <pYD p ((t> - 4>) 


vst 


—n /2 


Equivalently, 


P((p |z) « 


1 + 




—nj2 


(7.5.12) 

(7.5.13) 

(7.5.14) 

(7.5.15) 


where 


_<*S(») 

Art, Arh “ ,+ 1 d+1 


It follows that, a posteriori, the parameters of an autoregressive process have a multiple t 
distribution (A7.1.13), with u = n — p degrees of freedom. 

In particular, for the special case p = 1, (</> — is distributed exactly in a Student t 

distribution with n — 1 degrees of freedom where, using the general results given above, (}> 
and s^ are given by 


I>12 

D 22 


s 4> = 


1 Ai 

n — 1 £>22 



D 


12 


D U D 


11-^22 



(7.5.16) 


The quantity s for large samples, tends to [(1 — <fi 2 )/n ]*/ 2 and in the sampling theory 
framework is identical with the large-sample ‘ ‘standard error’ ’ for (j>. However, when using 
this and similar expressions within the Bayesian framework, it is to be remembered that it 
is the parameters (</> in this case) that are random variables. Quantities such as (]> and 
which are functions of data that have already occurred, are regarded as fixed. 


Normal Approximation. For samples of size n > 50, in which we are usually interested, 
the normal approximation to the t distribution is adequate. Thus, very nearly, <p has a joint 
p- variate normal distribution N {([>. } having mean vector ([> and variance-covariance 

matrix D -1 s 2 . 

P a 

Bayesian Regions of Highest Probability Density. In summarizing what the posterior 
distribution has to tell us about the probability of various <p values, it is useful to indicate a 
region of highest probability density, called for short an HPD region (Box and Tiao, 1965). 
A Bayesian 1 — e HPD region has the following properties: 
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1. Any parameter point inside the region has higher probability density than any point 
outside. 

2. The total posterior probability mass within the region is 1 — e. 


Since (p has a multiple 1 distribution, it follows, using the result (A7.1.4), that 

Pr{(0 - $)D P (<p - 0) < ps 2 a F £ (p,v)} = 1 - e (7.5.17) 

defines the exact 1 — e HPD region for <f>. Now, for v = n — p > 100, 

pF e ( P , v) at x;(p) 

Also, 

(0 - - $) = \ X X - $Mj - 4>j) 

i j 

Thus, approximately, the HPD region defined in (7.5.17) is such that 

£ X - <PMj - $j) < 2 s 2 ySp) (7.5.18) 

< i 


which if we set a 2 = s 2 is identical with the confidence region defined by (7.1.25). 

Although these approximate regions are identical, it will be remembered that their 
interpretation is different. From a sampling theory viewpoint, we say that if a confidence 
region is computed according to (7.1.25), then for each of a set of repeated samples, a 
proportion 1 — e of these regions will include the true parameter point. From the Bayesian 
viewpoint, we are concerned only with the single sample z, which has actually been 
observed. Assuming the relevance of the noninformative prior distribution that we have 
taken, the HPD region includes that proportion 1 — e of the resulting probability distribution 
of (p, given z, which has the highest density. In other words, the probability that the value 
of (p, which gave rise to the data z, lies in the HPD region is 1 — e. 

Using (7.5.11), (7.5.12), and (7.5.18), for large samples the approximate 1 — e Bayesian 
HPD region is bounded by a contour for which 


S(<t>) = S(4>) 



(7.5.19) 


which corresponds exactly with the confidence region defined by (7.1.27). 


7.5.4 Moving Average Processes 

If z, follows an ARIMA(0, d, q) process, then w t = \ /d z t follows a pure MA(g) process. 
Because of the duality in estimation results and in the information matrices, in particular, 
between the autoregressive model and the moving average model, it follows that in the 
moving average case the factors 11(0)| x / 2 and f(6) in (7.5.6), which in any case are 
dominated by S(6), also cancel for large samples. Thus, corresponding to (7.5.7), we find 
that the parameters 0 of the MA(g) process in w t have the posterior distribution 


—n /2 


p(0 |z) oc [S(0)J 


(7.5.20) 
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Again the sum-of-squares contours are, for moderate samples, essentially exact contours 
of posterior density. However, because [a,] is not a linear function of the 0’ s, S(G) will 
not be exactly quadratic in 6, though for large samples it will often be nearly so within the 
relevant ranges. In that case, we have approximately 

S(0) = | £ S U {0 ‘ ~ ~ 

i j 

where vs 2 = S(6) and v = n — q. It follows, after substituting for S(G) in (7.5.20) and using 
the exponential approximation, that the following holds: 

1. For large samples, 0 is approximately distributed in a multivariate normal distribution 

N^ns^-h 2 '}. 

2. An approximate HPD region is defined by (7.5.18) or (7.5.19), with q replacing p, 
and G replacing (f>. 

Example: Posterior Distribution of X = 1 — 6 for anIMA(0,1,1) Process. To illustrate, 
Figure 7.8 shows the approximate posterior density distribution p{X |z) from the data of 
Series B. It is seen to be approximately normal with its mode at X = 1.09 and having a 
standard deviation of about 0.05. A 95% Bayesian HPD interval covers essentially the same 
range, 0.98 < X < 1.19, as did the 95% confidence interval. Note that the density has been 
normalized to have unit area under the curve. 

7.5.5 Mixed Processes 

If z, follows an ARIMA(p, d, q) process, then w t = W d z, follows an ARMA (p, q) process 

= 9(B)a t 



FIGURE 7.8 Posterior density p(X |z) for Series B. 
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It can be shown that for such a process the factors 11(0. 0)| ^ 2 and /(0, 0) in (7.5.5) do not 
exactly cancel. Instead we can show, based on (7.2.24), that 

|I(0. 0)| '/ 2 /(0, 0) = 7(0*10. 0) (7.5.21) 

In (7.5.21), the 0*’s are the p + q parameters obtained by multiplying the autoregressive 
and moving average operators: 

(1 _ 0 * B - 0*5 2 - 4 , * p+q B p+q ) = (1 - 0i B - cj> p B») x (1 - 0,5-0,5«) 


and 7 is the Jacobian of the transformation from 0* to (0, 0), that is, 

p(0, 0|z) cx J(0*|0, 0)[A(0,0)]“" /2 (7.5.22) 

In particular, for the ARMA(1, 1) process, 0* = 0 + 0,0* = —00, 7 = |0 — 0|, and 

/>(0,0|z)oc |0-0|[A(0,0)]“" /2 (7.5.23) 

In this case, we see that the Jacobian will dominate in a region close to the line 0 = 0 and 
will produce zero density on the line. This is sensible because the sum of squares A(0,0) 
will take the finite value ^” =1 'u 2 for any 0 = 0 and corresponds to our entertaining the 
possibility that w t is white noise. However, in our derivation, we have not constrained the 
range of the parameters. The possibility that 0 = 0 is thus associated with unlimited ranges 
for the (equal) parameters. The effect of limiting the parameter space by, for example, 
introducing the requirements for stationarity and invertibility (—1 <0< 1,-1 <0< 1) 
would be to produce a small positive value for the density, but this refinement seems 
scarcely worthwhile. 

The Bayesian analysis reinforces the point made in Section 7.3.5 that estimation diffi¬ 
culties will be encountered with the mixed model and, in particular, with iterative solutions, 
when there is near redundancy in the parameters (i.e., near common factors between the 
AR and MA parts). We have already seen that the use of preliminary identification will 
usually ensure that these situations are avoided. 


APPENDIX A7.1 REVIEW OF NORMAL DISTRIBUTION THEORY 
A7.1.1 Partitioning of a Positive-Definite Quadratic Form 

Consider the positive-definite quadratic form Q p = x'X _1 x. Suppose that the p X 1 
vector x is partitioned after the />,th element, so that x' = (x' : x') = (x l ,x 2 ,... ,x p : 
x p + i ,..., x p ). and suppose that the p X p matrix X is also partitioned after the th row and 
column, so that 
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It is readily verified that 2 1 can be represented as 


2T 1 = 


Then, since 


l|-2tfZ 12 


r'-l 

J 11 


i J [ o |(s 22 -s' 2 S7/s 12 r 1 J 


i |o 


-£' S- 1 I 
12 11 


x'Z _1 x = (x' : x' 2 -x'S-’S^) X 


11 


L 0 |(S22-S , 1 2 S- 1 I: 12 ) _i 


-s'z 


-u 


12 11 1 , 


Q p = x'S 1 x can always be written as a sum of two quadratic forms Q and Q P2 , 
containing p x and p 2 elements, respectively, where 


Qp Qpi + Qpi 

Q Pi =x;2- | x l (A7.1.1) 

Q Pl = (x 2 - S'^/xj)'(S 22 - S' 2 S- 1 S 12 ) _1 (x 2 - S^S" 1 *,) 

We may also write for the determinant of S 

|£| = |SiiMS 22 -S' 12 S- 1 S 12 | (A7.1.2) 


A7.1.2 Two Useful Integrals 

Let z'Cz be a positive-definite quadratic form in z, which has q elements, so that z' = 
(zj, z 2 ,..., z 9 ), where —oo < z ( - < oo, i = 1,2,..., q, and let a , b, and m be positive real 
numbers. Then, it may be shown that 



-(nt+q)/ 2 (Iw)«/ 2 rOl/2) 

d z = -—- 

o m / 2 |C| / 2 r[(m + q)/2\ 


where the g-fold integral extends over the entire z space R , and 


I, 


z f Cz> qF () 


(1 + z'Cz/m) 


~(m+q)/2 


dz 


f R (\ + z'Cz /m) ( m +«)/ 2 j z 


7-Fn 


p(F|<j, m) dF 


(A7.1.3) 


(A7.1.4) 


where the function p(F|g, m) is the probability density of the F distribution with q and m 
degrees of freedom and is defined by 




-(«+?)/ 2 


r( ? /2)T(m/2) 

If w tends to infinity, then 

\ -(«+?)/2 


F > 0 (A7.1.5) 


1 + 


z'Cz 


tends to 


-(z'Cz)/2 
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and writing qF = / 2 , we obtain from (A7.1.4) that 


L 


z'Cz 


e~^ 2 dz 


f R e-(*'C*)/2 dz 



P(X 2 W)d / 2 


(A7.1.6) 


where the function p(x 2 \<f) is the probability density of the / 2 distribution with q degrees 
of freedom, and is defined by 


p(x 2 \q) 


1 ( „ 2 x (,- 2)/2 - x 2 /2 

2v/ 2 r(q/2) 


x 2 > o 


(A7.1.7) 


Here and elsewhere p{x) is used as a general notation to denote a probability density 
function for a random variable x. 


A7.1.3 Normal Distribution 

The random variable x is normally distributed with mean p and standard deviation a, or 
N(p. t7 2 ), if its probability density is 

p(x) = an)~ ll2 (o 2 )- l / 2 e- (x -^ 2 l 2 ^ (A7.1.8) 

Thus, the unit normal variate u = (x — p)/a has a distribution /V(0, 1). Table E in Part Five 
shows ordinates p(u E ) and values u E such that Pr{« > u E } = e for chosen values of e. 

Multinormal Distribution. The vector x' = (x l , x 2 ,, x p ) of random variables has a joint 
/;-variate normal distribution N{ p, S} if its probability density function is 

p(x) = ( 27 t)-^ / 2 | S | “ 1 (x-^)/2 (A71 9) 

The multinormal variate x has mean vector p = E[x] and variance-covariance matrix S = 
cov[x]. The probability density contours are ellipsoids defined by (x — jt/) , S _1 (x — p) = 
constant. For illustration, the elliptical contours for a bivariate (p = 2) normal distribution 
are shown in Figure A7.1. 

At the point x = p, the multivariate normal distribution has its maximum density 
maxp(x) = p(p) = {2n)- pl2 \L\- l/2 


The x 2 Distribution as the Probability Mass Outside a Density Contour of the Multivari¬ 
ate Normal. For the p-variate normal distribution, (A7.1.9), the probability mass outside 
the density contour defined by 

(x- p)"L~ l (x- p) = xl 

is given by the x 2 integral with p degrees of freedom: 

/ OO 

P(x 2 \p)dx 2 

-o 

where the x 2 density function is defined as in (A7.1.7). Table F in Part Five shows values 
of / 2 (p), such that Pr{ x 2 > X 2 (p) I = £ for chosen values of e. 
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Contours of bivariate distribution p (X||x 2 ) 


x. 



Marginal distribution 
M*i) 


/ 



f* 2.1 — M 2 + /? 2.1 (*1 P \) 


FIGURE A7.1 Contours of a bivariate normal distribution showing the marginal distribution p(x j) 
and the conditional distribution p(x 2 |x 10 ) at x x = x 10 . 


Marginal and Conditional Distributions for the Multivariate Normal Distribution. Sup¬ 
pose that the vector of p = p { + p 2 random variables is partitioned after the first /t, elements, 
so that 


x' = (x' : x') = (x v x 2 , ...,x pi : x Pi+l ,...,x Pi+P2 ) 


and that the variance-covariance matrix is 



Then using (A7.1.1) and (A7.1.2), we can write the multivariate normal distribution for 
the p = P\ + p 2 variates as the marginal distribution of x, multiplied by the conditional 
distribution of x 2 given Xj, that is. 


p(x) = p(x l ,x 2 ) = p(x ! )p(x 2 1 x!) 


= (2^/ 2 |S 11 |- 1 / 2 exp 


X (2tt|S 22 . 11 1 _ 1/2 exp 


(X| ~ fti/g 1 1i(xi— a<i) ~ 

2 

( x 2 _ ^2.l) , ^ 22 .i 1 ( x 2 _ ^2.l) 


2 


(A7.1.10) 


where 



(A7.1.11) 
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and p 2 i = p 2 + ^ 2 .i( x i - h\) = -Efelxj] define regression hyperplanes in (/q + p 2 )- 
dimensional space, tracing the loci of the (conditional) means of the p 2 elements of x 2 
as the pi elements of X[ vary. The p 2 X p\ matrix of regression coefficients is given by 

P2.i=z’ u *n- 

Both marginal and conditional distributions for the multivariate normal are therefore 
multivariate normal distributions. It is seen that for the multivariate normal distribution, 
the conditional distribution p(x 2 |x 1 ) is, except for location (i.e., mean value), identical 
whatever the value of x, (i.e., multivariate normal with identical variance-covariance 
matrix S 2211 ). 

Univariate Marginals. In particular, the marginal density for a single element x t (i = 
1,2, ... ,p) is N(pj, err), a univariate normal with mean g i equal to the z'th element of p and 
variance a 2 equal to the ;th diagonal element of X. 

Bivariate Normal. For illustration, the marginal and conditional distributions for a bivariate 
normal are shown in Figure A7.1. In this case, the marginal distribution of Xj is N(p t , rrp, 
while the conditional distribution of x 0 given x, is 



where p = (a^/ (7 2 )/J 2 j is the correlation coefficient between Xj and x 2 and p 2 j = a n /ct ~ 
is the regression coefficient of x 1 on Xj. 


A7.1.4 Student’s t Distribution 

The random variable x is distributed as a scaled t distribution with mean p and scale 
parameter s and with v degrees of freedom, denoted as t(p, s 2 , v), if its probability density 
is 


1/2/ V 

I'Ap+VT) 

1 (x- ^ 2 

\ 2 

,/ V 2 / V 2 / 

vs 2 


2 - 1 —(v+D/2 


(A7.1.12) 


Thus, the standardized t variate t = (x — p)/s has distribution f(0,1, v). Table G in Part 
Five shows values t e such that Pr{ t > t £ } = e for chosen values of e. 


Approach to Normal Distribution. For large v, the product 

tends to unity, while the right-hand bracket in (A7.1.12) tends to e“( 1 / 2?2 )(x—_ Thus, if 
for large v we write s 2 = a 2 , the t distribution tends to the normal distribution (A7.1.8). 

Multiple t Distribution. Let p! = (/q, p 2 ,..., p p ) be a p X 1 vector and S a p x p positive- 
definite matrix. Then, the vector random variable x has a scaled multivariate t distribution 
t(p, S, v), with mean vector p. scaling matrix S, and v degrees of freedom if its probability 
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density is 


"W = (2*r' /2 isr 1/2 (5)" /2 r(^)r- 1 (f) 


1 + 


(x p) S L (x-fi) 


~(v+p)/2 


(A7.1.13) 


The probability contours of the multiple t distribution are ellipsoids defined by (x — 
pY S _1 (x — p) = constant. 


Approach to the Multinormal Form. For large v, the product 

tends to unity; also, the right-hand bracket in (A7.1.13) tends to e - (*-in'S 1 (x-/j)/ 2 -p^us, 
if for large v we write S = 2, the multiple l tends to the multivariate normal distribution 
(A7.1.9). 


APPENDIX A7.2 REVIEW OF LINEAR LEAST-SQUARES THEORY 

A7.2.1 Normal Equations and Least Squares 

The linear regression model is assumed to be 

w i = fa x i\ + fa x i2 + ■" + fa x ik + e i (A7.2.1) 

where the w t (/' = 1,2,... ,n) are observations on a response or dependent vari¬ 
able obtained from an experiment in which the independent variables x n , x i2 ,..., 
x ik take on known fixed values, the /J, are unknown parameters to be estimated from the 
data, and the e, are uncorrelated random errors having zero means and the same common 
variance a 2 . 

The relations (A7.2.1) may be expressed in matrix form as 




*11 *12 ■ 

• x lk 


'fix' 


_e f 

w 2 

= 

*21 *22 ■ 

■ *2 k 


fa 

+ 

e 2 

w n _ 


*« 1 *«2 ■ 

■ x nk _ 


A 


e„ 


w = X/? + e (A7.2.2) 

where the nxk matrix X is assumed to be of full rank k. Gauss’s theorem of least-squares 
may be stated (Barnard, 1963) in the following form: The estimates f)' = ((S ] , /? 2 , ■■■, P k ) 
of the parameters /J, which are linear in the observations and unbiased for fi and which 
minimize the mean square error among all such estimates of any linear function 2|/?| + 
A2P2 + ••• + h k P k of the parameters, are obtained by minimizing the sum of squares 

S(P) = e'e = (w - X/J)'(w - X/J) 


(A7.2.3) 
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To establish the minimum of S(P), we note that the vector w — X/3 may be decomposed 
into two vectors w — X/J and X(p — P) according to 

w - X/3 = w - X/? + X(P - P) (A7.2.4) 

Hence, provided that we choose p so that X'(w — X/3 j = 0, that is, 

(X'X)/3 = X'w (A7.2.5) 

it follows that 

S(P) = S(P) + (p- fl)'X'X(f} - /?) (A7.2.6) 

and the vectors w — Xfi and X(p — fi) are orthogonal. Since the second term on the right- 
hand side of (A7.2.6) is a positive-definite quadratic form, it follows that the minimum is 
attained when P = p, where 

p = (x'xr'x'w 


is the least-squares estimate of p given by the solution to the normal equation (A7.2.5). 

A7.2.2 Estimation of Error Variance 

Using (A7.2.3) and (A7.2.5), the sum of squares at the minimum is 

S(P) = (w - XpY(yv - XP) = w'w - p'X'Xp (A7.2.7) 


Furthermore, if we define 


,2 

n — k 


(A7.2.8) 


it may be shown that E[s 2 \ = c 2 , and hence s 2 provides an unbiased estimate of the error 
variance a 2 . 


A7.2.3 Covariance Matrix of Least-Squares Estimates 

The covariance matrix of the least-squares estimates p is defined by 

V(j0) = cov[j8 J'] 

= cov[(X'Xr'x'w, w'X(X'X)- 1 ] 

= (X , X) _1 X , cov[w, w^XlX'X) -1 
= (X'XrV 2 (A7.2.9) 

since cov[w, w'] = Iff 2 . 

A7.2.4 Confidence Regions 

Assuming normality, the quadratic forms S(f}) and (/? — P)'X'X(p — p) in (A7.2.6) are 
independently distributed as c 2 times chi-squared random variables with n — k and k 
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degrees of freedom, respectively. Hence, 

(P - P)'X'X(P — P) n — k 
S0) k 

is distributed as F(k, n — k). Using (A7.2.8), it follows that 

($ - P)'X'X(P -P)< ks 2 F E (k , n - k) (A7.2.10) 

defines a 1 — e confidence region for ft. 


A7.2.5 Correlated Errors 

Suppose that the errors e in (A7.2.2) have a known covariance matrix Y, and let P be an 
n X n nonsingular matrix such that V -1 = PP' /ff 2 , so that P'VP = Id 2 . Then, (A7.2.2) may 
be transformed into 


P'w = P 'Xp + P'e 


or 


w * = X*p + e* (A7.2.11) 

where w* = P'w and X* = P'X. The covariance matrix of e* = P'e in (A7.2.11) is 
cov[P'e, e'P] = P'cov[e, e']P = P'YP = Iff 2 

Hence, we may apply ordinary least-squares theory with V = Iff 2 to the transformed model 
(A7.2.11), in which w is replaced by w* = P'w and X by X* = P'X. Thus, we obtain the 
estimates 


Pq = (X*'X*) _1 X*'w* 

with \(P G ) = cov[^ G ] = ff 2 (X*'X*) -1 . In terms of the original variables X and w of the 
regression model, since PP' = ff 2 V _1 , the estimate is 

P G = (x'pp'xr'x'pp'w = (x'r'xj-'x'r'w (A7.2.12) 


with 


Y(Pg) = cov[p G ] = (X'V _1 X) _1 

The estimator p G in (A7.2.12) is generally referred to as the generalized least-squares 
(GLS) estimator, and it follows that this is the estimate of p obtained by minimizing the 
generalized sum of squares function 


S(P\Y) = (w - X/?)'V _1 (w - XP) 
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APPENDIX A7.3 EXACT LIKELIHOOD FUNCTION FOR MOVING 
AVERAGE AND MIXED PROCESSES 


To obtain the required likelihood function for an MA(g) model, we have to derive the 
probability density function for a series w' = (w l , w 2 ,..., w n ) assumed to be generated by 
an invertible moving average model of order q: 


w, = a t - 9ia,_ x -6 2 a t _ 2 - ... ~0 q a,_ q (A7.3.1) 

where w t = w t — p, with p = E[w t ]. Under the assumption that the a t ’s and the w t ’s are 
normally distributed, the joint density may be written as 


P( W|0, C 2 a , p) = (2^2) " /2 |Mi°’ 9) | 1/2 exp 


-w'M (0 ’« } W 

n 

2a 2 

a 


(A7.3.2) 


where (M^) 1 a 2 denotes the n X n covariance matrix of the w t ’ s for an ARMA(/>, q) 
process. We now consider a convenient way of evaluating w'M^'^w, and for simplicity, 
we suppose that p = 0, so that w t = w t . 

Using the model (A7.3.1), we can write down the n equations: 


w t = a t — 9\d t _\ — 0 2 a t _ 2 — ... — 9 q a t _ q (t = 1,2,... ,ri) 

These n equations can be conveniently expressed in matrix form in terms of the 
n-dimensional vectors w' = (wq, w 2 ,..., w n ) and a' = (a l , a 2 , ..., a„), and the 
^-dimensional vector of preliminary values aj = («i_ 9 , a 2 ~ v ..., Oq) as 

w = L e a + Fa* 


where L e is an n X n lower triangular matrix with l’s on the leading diagonal, —9 1 on the 
first subdiagonal, —9 2 on the second subdiagonal, and so on, with 9, = 0 for i > q. Further, 
F is an n X q matrix with the form F = (IV, <yy where is q X q equal to 


B 


■7 


0 q 0. 

0 


q-l 


e 2 


0 0 ... 9 q 


Now the joint distribution of the n + q values, which are the elements of (a', a' ), is 


T(a,a*|o-2) = ( 2 x < t 2 ) ( " +9)/2 


exp 


-^( a ' a +< a j 


Noting that the transformation from (a, a*) to (w, a*) has unit Jacobian and a = L ff *(w — 
Fa*), the joint distribution of w = L s a + Fa* and a* is 


P( w, a* 1 9. (7 2 ) = (2na 2 a r^/ 2 exp 
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where 


S(6, aj = (w - Fa.)'L' V(w - Fa,) + a',a, (A7.3.3) 

Now, let a, be the vector of values that minimize S(0, a,), which from generalized least- 
squares theory can be shown to equal a, = D _l F , Lg _1 L“ 1 w, where D = I [; + F'L'^ 1 F. 
Then, using the result (A7.2.6), we have 

A(0, a,) = S(0) + (a, - a„)'D(a* - a*) 


where 


S(d) = S(0 , a,) = (w - Fa,) , L' _1 L“ 1 (w - FaJ + a'a, (A7.3.4) 

is a function of the observations w but not of the preliminary values a,. Thus, 


P( w, aJ0, a 2 ) = (2xa 2 a )~^ l2 exp <j -^[.S’(0) + (a, - a,)'D(a, - a,)] 


However, since the joint distribution of w and a, can be factored as 
P( w,a,|0, <r 2 ) = p( w|0, rj 2 )p(ajw, 9, a 2 ) 
it follows, similar to (A7.1.10), that 


p( ajw.0,^ 2 ) = {Inc])-* 12 | D I 1 / 2 exp 

PMO, <7 2 ) = (2^ t r 2 )-”/ 2 | D I" 1 / 2 exp 
We can now deduce the following: 


-a*)' D ( a * - a*) 




(A7.3.5) 

(A7.3.6) 


1. From (A7.3.5), we see that a, is the conditional expectation of a, given w and 6. 
Thus, using the notation introduced in Section 7.1.4, we obtain 

a* = lajw, 6\ = [a,] 

where [a] = L q *(w — F[a,]) is the conditional expectation of a given w and 9, and 
using (A7.3.4): 


S(0) = [a]'[a] + [aj'[aj = £ [a,] 2 (A7.3.7) 

t=\-q 

To compute S(9), the quantities [a t \ = [o,|w, 0] may be obtained by using the es¬ 
timates [a*]' = ([oi_ 9 ], [a 2 - q ], ..., [«o]) obtained as above by back-forecasting for 
preliminary values, and computing the elements [aj], [a 2 ], ..., [a„] of [a] recursively 
from the relation L e [a] = w — F[aJ as 

[a,] = w t + [a t _{\ + ... +0 q [a t _ q ] {t = 1,2,... ,ri) 
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Note that if the expression for a* is utilized in (A7.3.4), after rearranging we obtain 

S(0) = w'L'-'d,, - L-'FD-'F'L^-^L-'w = a 0 'a 0 - <Da* 

where a 0 = L“ 1 w denotes the vector whose elements a {) t can be calculated recursively 
from = w t + a®_ l + ••• + 6 q a®_ q , t = 1,2,...,«, by setting the initial values a* 

equal to zero. Hence, the first term described above, SJO) = a° a° = Y!t=i l 0 ?) 2 , i s 
the conditional sum-of-squares function, given a* = 0, as discussed in Section 7.1.2. 

2. In addition, we find that 

K’ q> = L ' 0 -\l n - Lg 1 FD _ 1 F'Lg -1 )L~ 1 
and S(0) = w'M^’^w. Also, by comparing (A7.3.6) and (A7.3.2), we have 

lor 1 = im'^i 

3. The back-forecasts a* = [aj can be calculated most conveniently from a* = I) 1 F'li 
(i.e., by solving Da* = F r u), whereu = L^L^w = L^'a 0 = {u^,u 2 , ... ,«„) / .Note 
that the elements u t of u are calculated through a backward recursion as 

u t = + 0l u t+l + ••■ + 9qU t +q 

from t = n down to r = 1, using zero starting values u n+] = ... = u n+q = 0, where 
the c/ 1 denote the estimates of the a t conditional on the zero starting values a* = 0. 
Also, the vectorh = F r u consists of the elements h } = — ^q-j+i u r J = 1, •••, <?. 

4. Finally, using (A7.3.6) and (A7.3.7), the unconditional likelihood is given exactly by 

L(0, ( T fl 2 |w) = ( ( T a 2 )-" /2 |Dr 1 / 2 exp|-^ 2 [a,] 2 1 (A7.3.8) 

For example, in the MA( 1) model with q = 1, we have F' = —(9, 0,.... 0), an n-dimensional 
vector, and L 0 is such that L” 1 has first column equal to (1, 0, 0 2 ..... so that 

i _ /}2(n+\) 

d = l + f'l'-'lt'f = i + e 2 + e 4 +... + e 2n = -——— 

86 i-e 2 

In addition, the conditional values a® are computed recursively as cP t = w t + 0o| ) _ 1 ,t = 
1,2using the zero initial value ojj = 0, and the values of the vector u = \' q 1 a 0 are 
computed in the backward recursion as u, = a® + 9u t+l , from t = n to t = 1, with u n+1 = 0. 
Then, 


i , M 0 d - 0 2 ) 

a* = [ fl0 ] = -D- 1 ^ = —D _1 m 0 = 
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where uq = ai + 6u\ = Qu\, and the exact likelihood for the MA(1) process is 




exp r ^§“" ]2 


(A7.3.9) 


Extension to the Autoregressive and Mixed Processes. The method outlined above may 
be readily extended to provide the unconditional likelihood for the general mixed model 

4>(B)w t = 0(B)a, (A7.3.10) 

which, with w t = V d z t , defines the general ARIMA process. Details of the derivation 
have been presented by Newbold (1974) and Ljung and Box (1979), while an alternative 
approach to obtain the exact likelihood that uses the Cholesky decomposition of a band 
covariance matrix (i.e., the innovations method as discussed in Section 7.4) was given by 
Ansley (1979). First, assuming a zero mean for the process, the relations for the ARMA 
model may be written in matrix form, similar to before, as 

L 0 w = L e a + Fe* 

where is an n x n matrix of the same form as I. 0 but with </>,-’s in place of 0, ’s. e' !: = 
(w', a') = (loi_ p , ..., w 0 , a l _ q ,..., a 0 ) is the (p + ^-dimensional vector of initial values, 
and 


F = 


A p B q 
0 0 


with 



4>p <t> p -\ ■ 

.. r/q 




Oq- 1 • 

•• oi 

A ,= 

o 4> p . 

•• 4> 2 

and 

», = - 

0 

e q . 

■■ #2 


0 o . 

.. (f) p 



0 

o . 

..e q 


Let Q(j 2 = Ple .e’ 1 denote the covariance matrix of e t . This matrix has the form 

a * * 


Qat 


C 


where T p = is a pX p matrix with (/,j)th element Yi_j, and it 2 C = ^[a^w^] 

has elements defined by E[a j _ q Wj_ p ] = up i/j_ i _ p+q for j — i — p + q> 0 and 0 other¬ 
wise. The \p k are the coefficients in the infinite MA operator i//( B) = <p~ l (B)Q(B) = 
X/j^o VA . i/'q = 1, and are easily determined recursively through equations in Section 
3.4. The autocovariances y k in T p can directly be determined in terms of the coefficients 
</>,-, Oj, and (7 2 , through use of the first (p + 1) equations (3.4.2) (see, e.g., Ljung and Box, 
1979). 
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Similar to the result in (A7.3.3), since a = L 0 '(L^w — Fe*) and e* are independent, the 
joint distribution of w and e* is 


P( W,e*|0,0,ff2) = (2^ ( t2 ) -(«+p+?)/2 | Q | 1/2 exp 




where 

S(<p, 0, ej = (L 0 w - Fej'L'-'L-^^w - FeJ + e'CT'e* 
Again, by generalized least-squares theory, we can show that 

A(0,0, ej = A(0,0) + (e* - e*)'D(e* - e*) 


where 


A(0,0) = S(4>, 0, ej = a'a + e'Q" 1 ^ (A7.3.11) 

is the unconditional sum-of-squares function and 

K = £[ejw,0,0] = [ej = D-^L^-'L-'L^w (A7.3.12) 

represents the conditional expectation of the preliminary values e*, with D = Q -1 + 
F'L^'Lg'F, and a = [a] = L“’(L^w — Fej. By factorization of the joint distribution 
of w and e*, we can obtain 


pM<P,0,cr 2 a ) = (2^)-” /2 |Qr I/2 |D|- 1 / 2 exp 




(A7.3.13) 


as the unconditional likelihood. It follows immediately from (A7.3.13) that the maximum 
likelihood estimate for er 2 is given by <7 2 = S(<p. 6)/n, where (j) and 0 denote maximum 
likelihood estimates. 

Again, we note that S((p , 0) = X"=i [« f ] 2 + e^Q - 1 e*, and the elements [aj, [a 2 ],, 
[a n ] of a = [a] are computed recursively from the relation L e [a] = L^w — F[eJ as 

[a,] = w,~ cp^w^] - ... - cp p [w t _ p ] + 0![a r _J + ... + & q [a t _ q \ 

for r = 1,2,using the back-forecasted values [e j for the preliminary values, with 
[w t ] = mi, for t = 1,2,... ,n. In addition, the back-forecasts e* = [ej can be calculated from 
e 3: = D 1 F'u, where u = L'” 1 Ljj 1 L^w = Lg _1 a°, and the elements u t of u are calculated 
through the backward recursion as 


u t - + ®\ u t+\ + ■•• + Q q u t +q 

with starting values u n+1 = ... = u n+q = 0, and the a® are the elements of a 0 = 

and denote the estimates of the a t conditional on zero starting values e* = 0. Also, the 
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vector h = F 7 u consists of the p + q elements: 

i 

2,(f>p-j+i u i j = i,...,p 

i= 1 
j~P 

- Z 0 q-j+p+l U i j =p+l,...,p + q 

i= 1 

Finally, using (A7.1.1) and (A7.1.2), in A($, 9) we may write e^Q -1 e* = a^a* + (w, — 
C'a^/K -1 ^* - C 7 a*), so that we have 

n 

S(4>, 9) = Yj [a,] 2 + (w* - C'aj'K-^w* - C 7 a*) (A7.3.14) 

t=l-q 

where K = <7j 2 r p — C 7 C, as well as |Q|=|K|. 

Therefore, in general, the likelihood associated with a series z of n + d values generated 
by any ARIMA process is given by 



m,9,c 2 a |z) = (27T(T 2 )~ n/2 |M^>| 1/2 exp 


S(4>,0) 

2a 2 

a 


(A7.3.15) 


where 


n 

5(0,0) = 2 [fl f ] 2 + <Q-'e, 

t= l 

and = |Q| -1 |D| -1 = |K| _1 |D| _1 . Also, by expressing the mixed ARMA model 

as an infinite moving average w t = (1 4- )//, B + i// 2 B 1 + ...)a t , and referring to results for 
the pure MA model, it follows that in the unconditional sum-of-squares function for the 
mixed model, we have the relation that e 7 Q _1 e a = Y, [a,] 2 . Hence, we also have the 

representation S(4>,9) = Z”=-oo[ a J 2 > and in practice the values [a t ] may be computed 
recursively with the summation proceeding from some point t = 1 — Q, beyond which the 
[aj’s are negligible. 

Special Case: AR(p). In the special case of a pure AR( p) model, the results described above 
simplify somewhat. We then have e* = w*, Q = a~ 2 T p , L 0 = I„,D = irjF” 1 + F 7 F = 

+ A^A p , and w # = D _1 F 7 L^w = D _1 A^LjjW^, where = (w j, w 2 ,..., w p ) and 
L[ [ is the p X p upper left submatrix of L^. It can then be shown that the back-forecasts w t 
are determined from the relations w, = (p l w t+l + + 4> p w t+p ,t = 0 , — 1 ,..., 1 — p, with 

w t = w t for 1 < t < n, and hence these are the same as values obtained from the use of 
the backward model approach, as discussed in Section 7.1.4, for the special case of the AR 
model. Thus, we obtain the exact sum of squares as S((p) = Zi"=i [ fl r] 2 + crjw^r” 1 w*. 

To illustrate, consider the first-order autoregressive process in w t , 


w t — <fiw t _ j = a t 


(A7.3.16) 
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where w t might be the c/th difference V d z, of the actual observations and a series z of 
length n + d observations is available. To compute the likelihood (A7.3.15), we require 

n 

w) = Zw 2 + a-^ 

t= 1 
n 

= - 4>W t _ x f + (w { - (j)W Q ) 2 + (1 - 

1=2 

since Fj = y 0 = cr^(l — </> 2 ) -1 . Now, because D = er^T" 1 + A'Aj = tT 2 }'/" 1 + (jr = 1, and 
hence w Q = (j)iv l , substituting this into the last two terms of S(cf>) above, it reduces to 

n 

S(4>) = £( w, - (j)w,_ x ) 2 + (1 - cjr)w\ (A7.3.17) 

t=2 

as a result that may be obtained more directly by methods discussed in Appendix A7.4. 

Special case: ARMA( 1,1). As an example for the mixed model, consider the ARMA( 1,1) 
model 


w t — | = a t — 6a t _ j 

Then, we have = ( w 0 , a 0 ), Aj = <p,B x = —9, and 


(7 2 Q : 


ff a 2 ro 1 

1 1 


(A7.3.18) 


with <r a 2 / 0 = (1 + 6 2 — 200)/(I — <j> 2 ). Thus, we have 


d = q- 1 + f'l'- 1 l- 1 f = —-- , 

O'o-l 

Y _ q2h (jr —<p9 

+ TO 2 -W e 2 


-l 

% 2 ro 


and the estimates of the initial values are obtained as e* = D 1 h. where h' = 
(h x , h 2 ) = (</>, —9)u\, the u, are obtained from the backward recursion u, = a® + 0u t+x , 
u n+l = 0, and a° t = w t — (j>w®_ l + f = 1,2,..., n, are obtained using the zero initial 

values w o = °o = w< t = w > ^ or ^ f — n - Thus, the exact sum of squares is obtained 

as 


A (< M )= 2>,] 2 + 

r=0 


(w Q - d Q ) 2 
°7 2 n> - 1 


(A7.3.19) 


with [a t \ = w t — 4>[w,_i\ + 0[a t _i\,t = 1,2,... ,n, and a 2 y§ — 1 = K = (0 — 9) 2 /( 1 — 
0 2 ). In addition, we have | | = {| K 11 D |} -1 , with 


|K| |D| = 1 + 


i _ e 2n (ci> - 9) 2 


1 -9 2 1 -<j> 2 
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APPENDIX A7.4 EXACT LIKELIHOOD FUNCTION FOR AN 
AUTOREGRESSIVE PROCESS 

We now suppose that a given series w' = (w j, w 2 , ■■■, «’„) is generated by the pth-order 
stationary autoregressive model: 


w, - - 0 2 w t _2 - (t> p w,_ p = a t 

where, temporarily, the w t ’s are assumed to have mean p = 0, but as before, the argument 
can be extended to the case where p # 0. Assuming normality for the a t ’s and hence for 
the w t ’ s, the joint probability density function of the w t ’ s is 


p(w|0,(7 2 ) = (2^ ( T fl 2 )-"/ 2 |M^°>| | / 2 exp 


n 

2a l 


(A7.4.1) 


and because of the reversible character of the general process, the «X/i matrix Mjj’ 0 ' 
is symmetric about both of its principal diagonals. Such a matrix is said to be doubly 
symmetric. Now, 

P( w|0. ° 2 a ) = p(w p+ 1 , w p+2 ,..., w n \ w p , 0, o 2 a )p(yv p , |0, o- 2 ) 

where = (utj, w 2 , . ■ ■, w p ). The first factor on the right may be obtained by making use 
of the distribution 


P(a p + 1 , •••, a„) = (2xro- 2 ) ( " p)/2 exp 


° a t=p+\ 


(A7.4.2a) 


For fixed w p , (a p+1 ,..., a„) and (w p+ \,, w n ) are related by the transformation 


a P +1 — w p+i _ 4>i w p ~ "• _ 0p M h 


— w n - ( t ) \ w n—\ - - n-p 

which has unit Jacobian. Thus, we obtain 


P(w p + 1 , ■■■ ,w n I w0, cr“) 


(2a:(7 2 ) p) / 2 exp 


'T7 Z _ 01^-1- (Pp^t-p? 

Z(J a t=p +1 


(A7.4.2b) 


Also, 


Thus, 


P(w p , |0, (7 2 ) = (2 xt( 7 2 ) p/2 |M^' 0, | 1/2 exp 




pjw^.a 2 ) = (2^ ( 7 2 )-"/ 2 |M^°)| 1 / 2 exp 


—S(<p) 


(A7.4.3) 
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and the elements of M^’ = M|f' 0) can now be deduced from the consideration that both 
Mf and are doubly symmetric. Thus, for example. 



+ 4>\ 

-0i 



' l -0i ' 

-01 rn^l + (p\ 


and after equating elements in the two matrices, we have 

M\ l) = = 1 -<j>\ 

Proceeding in this way, we find for processes of orders 1 and 2: 

Mj^ = l-0j \M ( V\ = 1 -0? 

r 1-0; -01(1 + 02)1 

= 

2 [-0 l( l + 0 2 ) 1-02 

|Mf| =(1 + 0 2 )2[(1 -02)2 -02] 
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For example, when p = 1, 
p( w|0,<7 2 ) = (2 ^(t;)“" / 2 (1 - </r) 1/2 exp 


1 


(1 - (jr)w\ + w t - (t>W t _y f 


t=2 


which checks with the result obtained in (A7.3.17). The process of generation must lead to 
matrices whose elements are quadratic in the 

Thus, it is clear from (A7.4.4) that not only is .S’(<"/>) = w'M^f’w a quadratic form in 
the w t ’ s, but it is also quadratic in the parameters (f>. Writing ([)' u = (1, </>j, </> 2 ,.... (f> p ), it is 
clearly true that for some (p + 1) X (/; + 1) matrix D whose elements are quadratic functions 
of the w t ’s, 

w'M^w = 0'D 4> u 


Now, write 





~ D 13 " 

" ~ D l,p+\ 

D = 

-£>12 

D 22 

D 23 ' 

" D 2,p +1 


~ D \,p+\ 

D 2,p +1 

D 3,p+ 1 ' 

^ p t 1 ./> i 1 


(A7.4.6) 


Inspection of (A7.4.4) shows that the elements £> /; are “symmetric” sums of squares and 
lagged products, defined by 


D u = Dji = WjWj + w i+l w J+l + ••• + w n+l _jw n+1 _i (A7.4.7) 

where the sum contains n — (i — 1) — (j — 1) terms. 

Finally, we can write the exact probability density, and hence the exact likelihood, as 


p(w|$,<7 2 ) = L{(f>, er 2 |w) = (Ina 2 ) " /2 |M^| 1/2 exp 

where 

n 

S(tp) = w'M^Wp + ^ (w t - cp^w^i - (P p w,_ p ) 2 = 0'D</>„ 

t=P +1 

and the log-likelihood is 

1(4), (7 2 |w) = -^InO 2 ) + iln|M^| - 

u a 

For example, when p = 1, we have 


-S(4>) 

2(7 a 


(A7.4.8) 


(A7.4.9) 


n 

S(4)) = (l - 4> 2 )w\ + ^(w, - 4)w,_ i) 2 

t=2 

n n n— 1 

= Yj w2 t ~ 2 ( ^Yj w t-i w t + 4> 2 Yj w2 t = D n ~ 2< P D n + $ 2 D 22 

t= 1 t=2 t=2 


(A7.4.10) 
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Maximum Likelihood Estimates. Differentiating with respect to and each of the <//s in 
(A7.4.10), we obtain 


_di_ = 1 s w 

da l 2(T l 2 ^ 2 

dl _9 

— = Mj + (D 1J+1 - (piD 2 j + i ‘MVu+i) 


(A7.4.11) 


(A7.4.12) 


where 


^ilnlM^I) 

Mj = —- 

2 Wj 

Hence, maximum likelihood estimates may be obtained by equating these expressions to 
zero and solving the resultant equations. 

We have at once from (A7.4.11) 


< 7 ? = 


S(4>) 


(A7.4.13) 


Estimates of <p. A difficulty occurs in dealing with equation (A7.4.12) since, in general, 
the quantities M - (j = 1,2 ,,p) are complicated functions of the <//s. We consider briefly 
four alternative approximations. 


1. Least-Squares Estimates. Since the expected value of S(4>) is proportional to n, while 
the value of |M^| is independent of n , (A7.4.8) is for moderate or large sample sizes 

dominated by the term in S((p) and the term in | | is, by comparison, small. 

If we ignore the influence of this term, then 

K0,<t 2 |w) ~-|ln(<7^)-(A7.4.14) 

u a 

and the estimates 0 of <p obtained by maximization of (A7.4.14) are the least-squares 
estimates obtained by minimizing Sif). Now, from (A7.4.9), Sift) = <p' u D<p u , where 
D is a (/> + 1) x (/> + 1) matrix of symmetric sums of squares and products, defined 
in (A7.4.7). Thus, on differentiating, the minimizing values are 


Du - </>! D 22 + <p2 D 23 + •" + ( l ) p D 2,p+\ 

D l3 = <fr l D 23 + <p2 D 33 + ■" + < Pp D 3.p +1 (A7.4.15) 

D \,p+\ = <t>l D 2,p+\ + ( h D 3,p+\ + + 4>p D p+l,p+\ 

which, in an obvious matrix notation, can be written as 

d = D p <P 
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so that 


0 = D; 1 d 

These least-squares estimates also maximize the posterior density (7.5.15). 

2. Approximate Maximum Likelihood Estimates. We now recall an earlier result (3.2.3), 
which may be written as 

Yj ~ (PiYj-i ~ ( l>2'/j-2 - QpYj-p = 0 J > 0 (A7.4.16) 

Also, on taking expectations in (A7.4.12) and using the fact that E[dl/d(f>j] = 0, we 
obtain 


Mj<r 2 a + (n - j)yj - (n - j - 1 )<l>iYj-i ~ (« ~j~ Z)4>iYj-2 

- in- j- P)4>pYj- p = 0 (A7.4.17) 

After multiplying (A7.4.16) by n and subtracting the result from (A7.4.17), we obtain 

M j a i = JYj - O' + WiYj-i - (j +P)$pYj-p 

Therefore, on using £> ;+1 y+1 /(n — j — i) as an estimate of Y\j-i\> a natural estimate 
of Mx j 2 is 

J a 


. D lj+I D 2 J +1 

j -—- -0 + 1)01 ■ 


n-j n — j — 1 

Substituting this estimate in (A7.4.12) yields 


0 + p)4l, 


D 


p+hj +1 
'n~j - p 


dl 


—2 


D 


Ij+i 


D 




2J+1 


-(/>„ 


D 


P+iJ+i 


~ J ~ P 

1,2,... ,p 


(A7.4.18) 


leading to a set of linear equations of the form (A7.4.15), but now with 

nD U 

D * = - - - 

,J n - (i - 1) - (j - 1) 

replacing D (/ -. 

3. Conditional Least-Squares Estimates. For moderate and relatively large n, we might 
also consider the conditional sum-of-squares function, obtained by adopting the 
procedure in Section 7.1.3. This yields the sum of squares given in the exponent of 
the expression in (A7.4.2), 


n 

S*i<t>) = Yi ( W ,- 01^-1-0p^r-p> 2 

t=P +1 

and is the sum of squares associated with the conditional distribution of w p+i ,..., w n , 
given w f p = (w l , w -,,..., w p ). Conditional least-squares estimates are obtained by 
minimizing <S , ^(0), which is a standard linear least-squares regression problem 
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associated with the linear model w t = 4>\W t _\ + <p 2 w t _2 + + 4>p w t-p + a,,t = 

p+ 1,... ,n. This results in the familiar least-squares estimates ([) = D^'d, as in 
(A7.2.5), where D p has (/', j) th element D jJ = Ylt = P +1 w t-i w t-j an d d has ith ele¬ 
ment d, = Y' t=P +\ 

4. Yule-Walker Estimates. Finally, if n is moderate or large, as an approximation, we 
may replace the symmetric sums of squares and products in (A7.4.15) by n times 
the appropriate autocovariance estimate. For example, D where |/ — j\ = k, would 

be replaced by nc k = ^t^t+k- 0° dividing by nc 0 throughout in the resultant 

equations, we obtain the following relations expressed in terms of the estimated 
autocorrelations r k = c k /c 0 : 

r \ = <b\ + <hj\ + ••• + (j} p r p _ 1 

r 2 = +0 2 + ••■ + ( h r p-2 

r P = 4>l r p-l+^2 r p-2 + - + 4>p 


These are the well-known Yule-Walker equations. 

In the matrix notation (7.3.1), they can be written r = R0, so that 

0 = R“‘r (A7.4.19) 

which corresponds to equations (3.2.7), with r substituted for p p and R for P /r 

To illustrate the differences among the four estimates, take the case p = 1. Then, M x a 2 u = 
—/ l and, corresponding to (A7.4.12), the exact maximum likelihood estimate of </> is the 
solution of 


n 

-Yi + D n - 4>D 22 = -Yi + ^ w t w t _ x 

t—2 


n— 1 

0 Yj wl t = 0 

t=2 


Note that / 1 = a 2 4>/(l — cf> 2 ) and the maximum likelihood solution for a 2 a 2 = S((f>)/n 
from (A7.4.13), can be substituted in the expression for y { in the likelihood equation above, 
where S(4>) = D n — 2</>D p + 0 2 D 12 as in (A7.4.9). This results in a cubic equation in </>, 
whose solution yields the maximum likelihood estimate of (/>. Upon rearranging, the cubic 
equation for cf) can be written as 

(n - 1 )D 22 0 3 -in- 2 )T» 12 0 2 - (nD 22 + D n )$ + nD n = 0 (A7.4.20) 

and there is a single unique solution to this cubic equation such that —1 < </> < 1 (e.g., 
Anderson, 1971, p. 354). 

Approximation 1 corresponds to ignoring the term jq altogether, yielding 


KU _ A2 
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Approximation 2 corresponds to substituting the estimate w t w t _ x /(n — 1) for y [ , 
yielding 

Z " = 2 w t w t _J{n-\) = „ _ 2 gi 2 
^ 2 /(« - 2) ~ n ~ lD 22 

Approximation 3 corresponds to the standard linear model least-squares estimate obtained 
by regression of w, on w t _y for t = 2,3, ... ,n, so that 


_ W t W,_ l _ -Pl2 

" i;u<i “^22+^ 

In effect, this can be viewed as obtained by substituting tpw 2 for y : in the likelihood 
equation above for <p. 

Approximation 4 replaces the numerator and denominator by standard autocovariance 
estimates (2.1.12), yielding 



n 

t=2 


W t w t ~\ 


I 


n 

1 = 1 


W 


2 

1 


Cl _ 

Co ' y D 


I>12 

11 


Usually, as in this example, for moderate and large samples, the differences between 
the estimates given by the various approximations will be small. We have often employed 
the least-squares estimates given by approximation 1 which can be computed directly from 
(A7.4.15). However, for computer calculations, it is often simplest, even when the fitted 
model is autoregressive, to use the general iterative algorithm described in Section 7.2.1, 
which computes least-squares estimates for any ARMA process. 


Estimate of a 2 u . Using approximation 4 with (A7.4.9) and (A7.4.13), 


-2 SW) h . Jtn 

e a = — = c 0 [1 • <P } 


' 1 -r'" 


1 

-r R 


A 


On multiplying out the right-hand side and recalling that r — Rr/> = 0, we find that 

& a = c 0 (l - r'0) = c 0 (l - r'R-'r) = c 0 (l - ^R0) (A7.4.21a) 

It is readily shown that can be similarly written in terms of the theoretical autocorrela¬ 
tions: 


^ = rod - p'4>) = rod - pVp) = rod - «/>'P P 4>) (A7.4.2ib) 

agreeing with the result (3.2.8). 

Parallel expressions for d 2 a may be obtained for approximations 1, 2, and 3. 

Information Matrix. Differentiating for a second time in (A7.4.11) and (A7.4.18), we 
obtain 

d 2 l = _n_ S((/)) 

d(c> 2 ) 2 2(er 2 ) 2 (o-2)3 


(A7.4.22a) 
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d 2 l ^ r _ 2 dl 
dialWj “ dcfrj 


(A7.4.22b) 


d 2 l n ®i+ 1 ,j +1 
dtpjdtpj o^n — i — j 


(A7.4.22c) 


Now, since 


£ 


a/ 

d( h 


= o 


it follows that for moderate or large samples, 


E 


d 2 l 

d(a 2 a )d<t>j 


~ 0 


and 


where 


11(0,^)! ~ |i(0)|/(<7 2 ) 


/K 2 ) = £ 


d 2 l 

d(<j2) 2 


n 


2((? 2 ) 2 


Now, using (A7.4.22c), we have 


!(</>) = -E 


d 2 l 

dQidcpj 


= «(M®)-‘ 


T- P 


(A7.4.23) 


Hence, 


i K0, a 2 ) i ~ 

° 2 (^) 2 p 

Variances and Covariances of Estimates of Autoregressive Parameters. Now, in circum¬ 
stances fully discussed by Whittle (1953), the inverse of the information matrix supplies the 
asymptotic variance-covariance matrix of the maximum likelihood (ML) estimates. More¬ 
over, if the log-likelihood is approximately quadratic and the maximum is not close to a 
boundary, even if the sample size is only moderate, the elements of this matrix will normally 
provide adequate approximations to the variances and covariances of the estimates. 

Thus, using (A7.4.23) and (A7.4.21b) gives 

V(0) = r'(0) * n~ x M<f> = n~ l a 2 a T~ l 

= «- 1 (i-p , p; 1 p)p; 1 

= n-'d - 0'P p 0)P;‘ = n~\ 1 - p'0)p; 1 


(A7.4.24) 
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In particular, for autoregressive process of first and second order, 




)-n 1 


1-02 -0l(l+0 2 ) 

-0l(l+0 2 ) 1-02 


(A7.4.25) 


Estimates of the variances and covariances may be obtained by substituting estimates for 
the parameters in (A7.4.25). For example, we may substitute /•■’s for />•’s and 0 for <p in 
(A7.4.24) to obtain 


V(0) = /7 _1 (1 - r'0)R _1 


(A7.4.26) 


APPENDIX A7.5 ASYMPTOTIC DISTRIBUTION OF ESTIMATORS FOR 
AUTOREGRESSIVE MODELS 

We provide details on the asymptotic distribution of least-squares estimator of the param¬ 
eters <p = ((/; |,.... (j) p )' for a stationary AR (p) model [i.e., all roots of <p(B) = 0 lie outside 
the unit circle], 


p 

w, = Yj 4>, w t-i + a t 

i=i 

based on a sample of n observations, where the w t are assumed to have mean p = 0 for 
simplicity, and the a, are assumed to be independent random variates, with zero means, 
variances er 2 and finite fourth moments. It is then established that 

n 1 / 2 ( 0 - 0 )^IV{O, < 72 r; 1 ( 0 )} (A7.5.1) 

as n —► oo, where r p ($) is the p X p autocovariance matrix of p successive values from the 
AR( p) process. Hence, for large n the distribution of 0 is approximately normal with mean 
vector <p and covariance matrix V(0) ~ n~ l rr^r~ 1 (0), that is, N {0, n~ 1 o~T~ l ((p)}. 

We can write the AR(p) model as 

w t = w'j 0 + a, (A7.5.2) 

where w' j = ( w t _ t ,..., w,_ p ). For convenience, assume that observations w l _ p ,..., w 0 
are available in addition to ttq,..., w n , so that the (conditional) least-squares estimator of 
(p is obtained by minimizing the sum of squares: 

n 

Y(0) = - w'_i0) 2 

i=i 


As n —► oo, the treatment of the p initial observations becomes negligible, so that conditional 
and unconditional LS estimators are asymptotically equivalent. From the standard results 
on LS estimates for regression models, we know that the LS estimate of <p in the AR(/i) 
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model (A7.5.2) is then given by 

( n \ — * n 

Z w,_iw;_i j ,_iw, 

Substituting the expression for w t from (A7.5.2) in (A7.5.3), we see that 

/ n \ — 1 n 

0 = 0+1 Z w r-l w ^-i ) Z 


0=1 


so that 


i=t 


-l 


M 1/2 (0-0)=^M ' Z W '- |W J_1 J " 

Notice that the information matrix for this model situation is simply 






d(pd(j)' 


= -^Z £[W '-'<|] = ^r/0) 


(A7.5.3) 


(A7.5.4) 


'a (=1 


so that nl _1 (0) = I” 1 (0) = (T 2 r~ 1 (0) as appears in (A7.5.1). 

We let U, = w t _ 1 a l and argue that these terms have zero mean, covariance matrix 
tr 2 r p (0), and are mutually uncorrelated. That is, noting that w r _, and a, are indepen¬ 
dent (e.g., elements of w,_[ are functions of a t _ 1 ,a t _ 2 ,, independent of a t ), we have 
.ETWf.ja,] = £[w r _i ]E[a t ] = 0, and again by independence of the terms a 2 and 

covtw^a,] = Elv/^ajdjW^} = £[a 2 ]£'[w f _ 1 wj_ 1 ] = (t 2 ^!^) 

In addition, for any / > 0, 

covtw^a,, Yi t+l _xd t+l \ = E[w t _ iar a t+l w' t+l] \ 

= £[o,w r _ 1 w' +/ _ 1 ]£[o r+/ ] = 0 

because a t+l is independent of the other terms. By similar reasoning, 


covlw^^.w^ja,^] = 0 

for any / < 0. Hence, the quantity J]" =1 w t ~\ a t in (A7.5.4) is the sum of n uncorrelated 
terms each with zero mean and covariance matrix 1 cr 2 r p (0). 

Now, in fact, the partial sums 

n n 

S„ = Z U '=Z w '-i fl ' n = 1 , 2 ,... 

i =l i=i 

form a martingale sequence (with respect to the a fields generated by the collection of 
random variables { a n , a n _ 1? ...}), characterized by the property that £[S M+1 \a n , a n _ j,...] = 
S„. This clearly holds since S„ +1 = w H o„ +1 4- S„, 

£ K,a„ + iK,a„_i,...] = w„E[a n+l \a n ,a n _ u ...] = w n E[a n+l ] = 0 
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and S n = Ylt=i w t-i a t * s a f unct i° n of a n , a n _ v ... so that E^SJa^, a n _ v ...] = S n . In this 
context, the terms U r = w,_ | a t are referred to as a martingale difference sequence. Then, 
by a martingale central limit theorem (e.g., Billingsley, 1999), 

n~ l / 2 c'S n -^> !V{0, er^c'r p (</>)c} 

for any vector or constants c' = (q,..., c p ), and by use of the Cramer-Wold device, it 
follows that 


«- 1/2 S„ = n- 1 / 2 2 N{Q,o]T p m (A7.5.5) 

t= 1 

V 

as n —> oo. Also, we know that the matrix n~ ] ^” =] w r _! w'_ t —> F p (<p), as an n —»• oo, 
by a weak law of large numbers, since the (i, j) th element of the matrix is y(i — j) = 
n _1 Y!t=i w t-i w t-j > which converges in probability to y(i — j) by consistency of sample 
autocovariances y(i — j). Hence, it follows by continuity that 

( w_1 (A7 - 5 - 6) 

Therefore, by a standard limit theory result, applying (A7.5.5) and (A7.5.6) in (A7.5.4), we 
obtain that 


- 4>)^ F~\(p)N{0, o 2 r p ((p)} (A7.5.7) 

which leads to the result (A7.5.1). 

In addition, it is easily shown that the Yule-Walker (YW) estimator (j) = R _ 1 r, discussed 
in Section 7.3.1, is asymptotically equivalent to the LS estimator considered here, in the 
sense that 


n ] / 2 ((p - (p)—+ 0 

as n -»• oo. For instance, we can write the YW estimate as ([) = f ' p y p where F p = j> 0 R and 
y p = y q r. For notational convenience, we write the LS estimate in (A7.5.3) as (j) = F ' y p 
where we denote r p = n _1 Yi"=i w r-i w J_j and y p = n~ l X” =1 Then, we have 

n l/2 (4> ~(f>) = n' /2 (t p 1 y p - F~'y p ) 

= n^ 2 t~\y p - y p ) + « 1/2 (Tj‘ - f; 1 )^ (A7.5.8) 

1/9 ^ | try ~ V 

and we can readily determine that both n ' (y p — y p )—> 0 and n ' (F p — r p )—> 0 as 
n -> oo, and consequently also 

« l/2 < - f; 1 ) = f - p l n' /2 (t p - fpr 1 4 0 

Therefore, >f/ 2 ((j) — <p )—>■ 0 follows directly from (A7.5.8). 
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APPENDIX A7.6 EXAMPLES OF THE EFFECT OF PARAMETER 
ESTIMATION ERRORS ON VARIANCES OF FORECAST ERRORS 
AND PROBABILITY LIMITS FOR FORECASTS 

The variances and probability limits for the forecasts given in Section 5.2.4 are based on the 
assumption that the parameters ($, 0) in the ARIMA model are known exactly. In practice, 
it is necessary to replace these by their estimates (<p, 0). To gain some insight into the effect 
of estimation errors on the variance of the forecast errors, we consider the special cases of 
the nonstationary IMA(0, 1, 1) and the stationary first-order autoregressive processes. It 
is shown that for these processes and for parameter estimates based on series of moderate 
length, the effect of such estimation errors is small. 

IMA(0,1,1) Processes. Writing the model Vz f = a t — 0a r _ t for t + l.t + I — 1,..., 1 4- 1, 
and summing, we obtain 

z t+l — z t = a t+l + (1 - 9)(a t+l _ l -t- + a, +l ) - 9a, 

Denote by z,(l\9) the lead / forecast when the parameter 9 is known exactly. On taking 
conditional expectations at time f, for / = 1,2,..., we obtain 

z. t {\\6) = z, — 9a, 
z,U\9) = z t (\\9) l >2 

Hence, the lead l forecast error is 

e,(I\9) = z, +l -z,(l\9) 

= a t+ , + (1 - 9)(a, + j _| + + a, +l ) 


and the variance of the forecast error at lead time / is 

V(l) = E,[e;(l\9)] = <#1 + (1 ~ 1M 2 ] (A7.6.1) 


where A = 1 —9. 

However, if 9 is replaced by its estimate 9, obtained from a time series consisting of n 
values of w, = Vz r , then, 

z,(l\9) = z, — 9 a, 
z,(l\9) = z,(\\9) l > 2 

where a, = z, — z,_ l (l\9). Hence, the lead / forecast error using 9 is 

e t m = z t+l - z(l\0) 

= z, + i z f + 9a, 

= e,(l\9)-(9a,-9a,) (A7.6.2) 


Since Vz, = (1 — 9B)a, = (1 — 9B)a t , it follows that 

1 -9B 
1 -9B 
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and on eliminating a t from (A7.6.2), we obtain 


e t (l\0) = e t (l\0)- 


0-0 
1 - OB 


a 


t 


Now, 


0-0 0-0 
1 -0B a, ~ 1 -OB 
e ~0 
~ 1 - OB 


1 + 


(0 - 0)B 
1 - OB 


a t 


(0 - 0)B 
1 - OB 


— (0 — 0)(a t + Oa t _Y + 0~ a t _2 + •••) 


— (0 — Oy^(a t _\ + 2.0a I _2 + 30~a t _2 + •••) 


(A7.6.3) 


On the assumption that the forecast and the estimate 0 are based on essentially nonover¬ 
lapping data, 0 and a t , a t _\,... are independent. Also, 0 will be approximately normally 
distributed about 0 with variance (1 — 0 2 )/n, for moderate-sized samples. On these as¬ 
sumptions the variance of the expression in (A7.6.3) may be shown to be 



n 


i + Tl+4 

n 1 -0 2 


Thus, provided that \0\ is not close to unity. 


var[e t (/|0)] c ^[1 + (/ - 1 )A 2 ] + ^ (A7.6.4) 

n 

Clearly, the proportional change in the variance will be greatest for / = 1, when the exact 
forecast error variance reduces to a 2 . In this case, for parameter estimates based on a series 
of moderate length, the probability limits will be increased by a factor (n + 1 )/n. 

First-Order Autoregressive Processes. Writing the AR(1) model z t = 4>z t _\ + a, at time 
t + / and taking conditional expectations at time t. the lead / forecast, given the true value 
of the parameter <f>, is 


f r (/|0) = 0f,(/-l|0) = 0 / z r 


Similarly, 

z t (!\$) = 4>z t (!-\\<i>) = $z t 

and hence 


e t m) = Z t+I - z t (l\$) = e,m) + (</>' - 4> l )z, t (A7.6.5) 

Because e t (l\4>) = z, +1 — z,(l\4>) = a t+l + + </> ,_1 o r+1 is independent of </> 

and z t , it follows from (A7.6.5) that 

E[e 2 t {im = E[e 2 (im + E[^ - <fi 1 ) 2 ] 
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Again, as in the MA(1) case, the estimate <p is assumed to be essentially independent of 
z t , and for sufficiently large n, <]> will be approximately normally distributed about a mean 
cp with variance (1 - cp 2 )/n. So using (5.4.16) and E[z 2 (<// - (p 1 ) 2 ] cz E[^]E[((p l — cp 1 ) 2 ], 
with E[z 2 ] = '/o = er 2 /(l — </> 2 ), on the average 


r I,-, 2 1 - 0 2/ 2 £[(</> / -0 / ) 2 ] 

varle,(/ <*)] ~ cr 2 ---1- a z ---—— 

' a l-(P 2 a 1-02 

When / = 1, using E[((p — <p ) 2 ] ~ (1 — <p 2 )/n , 


var[e ( (l 10)] ~o a + 


2 , a a 1- V 


1 — 0- n 


= ^ + ") 
n 


For I > 1, we have 


(A7.6.6) 


(A7.6.7) 


0 ' - $' = 0 ' - {0 - (0 - 0 )}' ^ - {</>' - - 0 )} = 1<P‘~\<P - $) 


since the remaining terms involving ((p — <py for j =2,...,/ are of smaller order. Thus, on 
the average, from (A7.6.6) we obtain 


var[e,(/|0)] * var[e,(/|0)] + - 0) 2 ] 

1 — </) Z 


= var[e,(/|0)] + 


1 2 <P 2 «- 


and the discrepancy is again of order n 1 . 


General-Order Autoregressive Processes. Related approximation results for the effect of 
parameter estimation errors on forecast error variances have been given by Yamamoto 
(1976) for the general AR(p) model. In particular, we briefly consider the approximation 
for one-step-ahead forecasts in the AR(p) case. Write the model at time t + 1 as 

z t+x = (p x z t + (p 2 z t - 1 + + (p p z t+l _ p + a t+l = z ! t (f> + o r+1 

where z' = (z t ,z t _ u ... ,z t+l _ p ) and (p f = (0 X , 0 2 , ...,<P p ). Then, 

z.,(l\(p) = cP x z, + (p 2 z,_ i + - + 4> p z t+1 _ p = z[(p 

and similarly, z t (l\<p) = z' t <p, where <p is the ML estimate of (p based on n observations. 
Hence, 


e ( (l|0) = e f (l|0) + z f '(0 - 0) (A7.6.8) 

Using similar independence properties as above, as well as cov[z r ] = Y p and the asymptotic 
distribution approximation for 0(see, e.g., [7.2.19] and [ A7.4.23]) that cov| 01 ~ n _l o^r^ 1 , 
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it follows that 

E[ej(im = E[e 2 t (\m + E[{z' t (4> - ^>)} 2 ] 

= cj 2 + K{E[z t z’ t ]E[(<p - 4>)(4> - <£)']} 

= a 2 a+ tr{r p n-'a 2 r-'} 

Thus, the approximation for one-step-ahead forecast error variance, 

var[e f (l|0)] s (l + J) (A7.6.9) 

is readily obtained for the AR model of order p. 


APPENDIX A7.7 SPECIAL NOTE ON ESTIMATION OF MOVING AVERAGE 
PARAMETERS 

If the least-squares iteration that involves moving average parameters is allowed to stray 
outside the invertibility region, parameter values can readily be found that apparently 
provide sums of squares smaller than the true minimum. However, these do not provide 
appropriate estimates and are quite meaningless. To illustrate, suppose that a series has 
been generated by the first-order moving average model w t = (1 — 6B)a t with—1 <9 < 1. 
Then, the series could equally well have been generated by the corresponding backward 
process w t = (1 — 0F)e, with a 2 = a 2 . Now, the latter process can also be written as 
w t = (1 — 9~ l B)a t , where now 0~ l is outside the invertibility region. However, in this 
representation a 2 = a 2 9 2 and is itself a function of 0. Therefore, a valid estimate of 0 ~ 1 
will not be provided by minimizing a 2 = 9 2 2, a ~ t - Indeed, this has its minimum at 
0~ l = oo. 

The difficulty may be avoided: 


1. By using as starting values rough preliminary estimates within the invertibility region 
obtained at the identification stage. 

2. By checking that all moving average estimates, obtained after convergence has ap¬ 
parently occurred, lie within the invertibility region. 


It is also possible to write least-squares programs such that estimates are constrained to 
lie within the invertibility region, and to check that moving average estimates lie within the 
invertibility region after each step of the iterative least-squares estimation procedure. 


EXERCISES 

7.1. The following table shows calculations for an (unrealistically short) series z, for 
which the (0, 1, 1) model w t = Vz, = (1 — 9B)a t is being considered with 9 = —0.5 
and with an unknown starting value a 0 . 
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t 

A 

w, = Vz ( 

a, = w, - 0.5a,_i 

0 

40 


a n 

1 

42 

2 

2 - 0.50o 0 

2 

47 

5 

4 + 0.25o 0 

3 

47 

0 

-2 - 0.13a 0 

4 

52 

5 

6 + 0.06a 0 

5 

51 

-1 

-4 - 0.03a 0 

6 

57 

6 

8 + 0.02a 0 

7 

59 

2 

-2 - 0.01a 0 


(a) Confirm the entries in the table. 

(b) Show that the conditional sum of squares is 

7 

I - 0.5, a Q = 0) 2 = 5*(-0.5|0) = 144.00 

t= l 

7.2. Using the data in Exercise 7.1: 

(a) Show (using least-squares) that the value a 0 of a Q that minimizes S*(— 0.5 1 0) is 

„ _ (2)(0.50) + (4)(—0.25) + - + (-2X0.0078) _ ~ I" =0 d ' a ° 
a °~ l 2 + 0.5 2 + +0.0078 2 ~~ 2"= o 02 ‘ 

where a® = ( a t \ 6, a Q = 0) are the conditional values. Compare this expression 
for o 0 with that for the exact back-forecast [o 0 ] in the MA(1) model, where the 
expression for [a 0 ] is given preceding the equation (A7.3.9) in Appendix A7.3, 
and verify that the two expressions are identical. 

(b) By first writing this model in the backward form w t = (1 — 6F)e t and recursively 
computing the e’s, show that the value of a Q obtained in (a) is the same as that 
obtained by the back-forecasting method. 

7.3. Using the value of a 0 calculated in Exercise 7.2: 

(a) Show that the unconditional sum of squares S(— 0.5) is 143.4. 

(b) Show that for the (0, 1,1) model, for large n. 


s(e) = s,(d |0)-—ii- 

1 — u 


1A. 


For the process w t = /,<„, + ( 1 —9B)a t show that for long series the variance- 
covariance matrix of the maximum likelihood estimates fl w , 0 is approximately 

fa-^ o l 


7.5. (a) Problems were experienced in obtaining a satisfactory fit to a series, the last 16 
values of which were recorded as follows: 

129, 135, 130, 130, 127, 126, 131, 152, 

123, 124, 131, 132, 129, 127, 126, 124 
Plot the series and suggest where the difficulty might lie. 
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(b) In fitting a model of the form (1 — 0i B - 02-B 2 )z f = (1 — 6B)a t to a set of 
data, convergence was slow and the coefficient estimates in successive iterations 
oscillated wildly. Final estimates having large standard errors were obtained as 
follows: </>i = 1.19, 02 = —0.34, 9 = 0.52. Can you suggest an explanation for 
the unstable behavior of the model? Why should preliminary identification have 
eliminated the problem? 

(c) In fitting the model V 2 z r = (1 — 6 l B — 9 2 B 2 )a t convergence was not obtained. 
The last iteration yielded the values 9 l = 1.81, 0 n = 0.52. Can you explain the 
difficulty? 

7.6. For the ARIMA(1, 1, 1) model (1 — <pB)w t = (1 — 9B)a t , where w t = Vz r : 

(a) Write down the linearized form of the model. 

(b) Set out how you would start off the calculation of the conditional nonlinear 
least-squares algorithm with start values 0 = 0.5 and 6 = 0.4 for a series whose 
first nine values are shown below. 


t 

z t 

t 


0 

149 

5 

150 

1 

145 

6 

147 

2 

152 

7 

142 

3 

144 

8 

146 

4 

150 




7.7. (a) Show that the second-order autoregressive model z r = 0 , z , t _, + 4> 2 z t _n 
+ a t may be written in orthogonal form as 


01 


-Zr-l + 0 2 2 ( _ 




' l-0 2 ' 

suggesting that the approximate estimates 


“ 2 1-02 '- 1 


+ a, 


0i - r 2 - n 

/q of - and 0 2 = -— of 0 2 


1 - 0 2 


!-r 2 


are uncorrelated for long series. 

(b) Starting from the variance-covariance matrix of 0] and 0 2 or otherwise, show 
that the variance-covariance matrix of /q and 0 2 for long series is given approx¬ 
imately by 


(l-0 2 )(l-p 2 ) 


0 

1 -02 


7.8. The preliminary model identification performed in Chapter 6 suggested that either an 
ARIMA(1, 1, 0) or an ARIMA(0, 2, 2) model might be appropriate for the chemical 
process temperature readings in Series C. The series is available for download from 
http://pages.stat.wisc.edu/ reinsel/bjr-data/. 

(a) Estimate the parameters of the ARIMA( 1, 1,0) for this series using R. 

(b) Estimate the parameters of the ARIMA(0, 2, 2) model and compare the results 
with those in part (a). 
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7.9. Repeat the analysis in Exercise 7.8 by fitting (a) an ARfl) and (b) an ARMA(0, 1, 1) 
model to the chemical process viscosity readings in Series D. 

7.10. Daily air quality measurements in New York, from May to September 1973, are 
available in a file called ‘airquality’ in the R datasets package. The file provides 
data on four air quality variables: mean ozone levels at Roosevelt Island, solar 
radiation at Central Park, maximum daily temperature at La Guardia Airport, and 
average wind speeds at La Guardia Airport. 

(a) Identify suitable models for the daily temperature and wind speed series. 

(b) Estimate the parameters of selected models and comment. 

7.11. Consider the solar radiation series that is part of the New York airquality data file 
described in Problem 7.10. This series has a few missing values. 

(a) Impute suitable estimates of the missing values. (Note: A formal procedure for 
estimating missing values is described in Chapter 13, but is not needed here). 

(b) Identify a model for the resulting series. 

(c) Estimate the parameters of selected model and comment. 

7.12. Refer to the annual river flow measurements in the time series ‘Nile’ analyzed in 
Exercise 6.7. Estimate the parameters of the model or models identified for this time 
series and comment. 





MODEL DIAGNOSTIC CHECKING 


The model having been identified and the parameters estimated, diagnostic checks are then 
applied to the fitted model. One useful method of checking a model is to overfit, that is, to 
estimate the parameters in a model somewhat more general than that which we believe to be 
true. This method assumes that we can guess the direction in which the model is likely to be 
inadequate. Therefore, it is necessary to supplement this approach by less specific checks 
applied to the residuals from the fitted model. These allow the data themselves to suggest 
modifications to the model. In this chapter, we describe two such checks that employ 
(1) the autocorrelation function of the residuals and (2) the cumulative periodogram of the 
residuals. Some alternative diagnostic procedures are also discussed. Numerical examples 
are included to demonstrate the results. 


8.1 CHECKING THE STOCHASTIC MODEL 

8.1.1 General Philosophy 

Suppose that using a particular time series, the model has been identified and the parameters 
estimated using the methods described in Chapters 6 and 7. The question remains of deciding 
whether this model is adequate. If there is evidence of serious inadequacy, we need to know 
how the model should be modified in the next iterative cycle. What we are doing is described 
only partially by the words “testing goodness of fit.” We need to discover in what way a 
model is inadequate, so as to suggest appropriate modification. To illustrate, by reference 
to familiar procedures outside time series analysis, the scrutiny of residuals for the analysis 
of variance, described by Anscombe (1961) and Anscombe and Tukey (1963), and the 

Time Series Analysis: Forecasting and Control, Fifth Edition. George E. P. Box, Gwilym M. Jenkins, 
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criticism of factorial experiments, leading to normal plotting and other methods, described 
by Daniel (1959), would be called diagnostic checks. 

All models are approximations and no model form can ever represent the truth absolutely. 
Given sufficient data, statistical tests can discredit models that could nevertheless be entirely 
adequate for the purpose at hand. Alternatively, tests can fail to indicate serious departures 
from assumptions because of small sample sizes or because these tests are insensitive to the 
types of discrepancies that occur. The best policy is to devise the most sensitive statistical 
procedures possible but be prepared to employ models that exhibit slight lack of fit. If 
diagnostic checks, which have been thoughtfully devised, are applied to a model fitted to 
a reasonably large body of data and fail to show serious discrepancies, then we should feel 
comfortable using that model. 

8.1.2 Overfitting 

One technique that can be used for diagnostic checking is overfitting. Having identified 
what is believed to be a correct model, we actually fit a more elaborate one. This puts 
the identified model in jeopardy because the more elaborate model contains additional 
parameters covering feared directions of discrepancy. Careful thought should be given to 
the question of how the model should be augmented. In particular, in accordance with the 
discussion on model redundancy in Section 7.3.5, it would not make sense to add factors 
simultaneously to both sides of the ARMA model. Moreover, if the analysis fails to show 
that the additions are needed, we have, of course, not proved that our model is correct. A 
model is only capable of being “proved” in the biblical sense of being put to the test. As 
was recommended by Saint Paul in his first epistle to the Thessalonians, what we can do is 
to “Prove all things; hold fast to that which is good.” 

Example of Overfitting. As an example, we consider again some IBM stock price data. 
For this analysis, data were employed that are listed as Series B' in the Collection of Time 
Series in Part Five of this book. This series consists of IBM stock prices for the period 1 
June 29, 1959-June 30, 1960. The (0, 1, 1) model 

Vz r = (1 - 0B)a t 

with Aq = 1 — 6 = 0.90, was identified and fitted to the 255 available observations. 

The (0, 1,1) model can equally well be expressed in the form 

Vz ? = -P Vu r 

The extended model that was considered in the overfitting procedure was the (0, 3, 3) 
process 

v 3 z f = (i -e x B- e 2 B 2 - e 3 B 3 )a t 

or using (4.3.21), in the form 

= (2 0 V 2 + 2jV + 2*2 )^f_i + 


lr The IBM stock data previously considered, referred to as Series B, cover a different period, May 17, 
1961-November 2, 1962. 



286 MODEL DIAGNOSTIC CHECKING 


While this model may seem overly elaborate, the immediate motivation for extending the 
model in this particular way was to test a suggestion made by Brown (1962) that the series 
should be forecasted by an adaptive quadratic forecast function. Now, it was shown in 
Chapter 5 that an IMA(0, q, q ) process has for its optimal forecasting function an adaptive 
polynomial of degree q — 1. Thus, for the extended (0, 3, 3) model above, the optimal lead 
/ forecast function is the quadratic polynomial in 1: 

z,(I ) = b® + bfl + bfl 2 

where the coefficients b^P, b { j°, and b^ are adjusted as each new piece of data becomes 
available. 

By comparison, the model we have identified is an IMA(0, 1,1) process, which yields 
a forecast function 

2,(0 = (8.1.1) 

This is a “polynomial in /” of degree zero. Hence, the model implies that the forecast 
z t (l ) is independent of /, that is, the forecast at any particular time t is the same for one 
step ahead, two steps ahead, and so on. In other words, the series contains information 
only on the future level of the series, and nothing about slope or curvature. At first sight, 
this is somewhat surprising because, using hindsight, quite definite linear and curvilinear 
trends appear to be present in the series. Therefore, it is worthwhile to check whether 
nonzero values of A l and A 2 , which would produce predictable trends, actually occur. 
Sum-of-squares grids for 5'(/l 1 , A 2 \A 0 ) similar to those shown in Figure 7.2 were produced 
for A 0 = 0.7, 0.9, and 1.1, which showed a minimum close to A 0 = 0.9, A l = 0, and A 2 = 0. 
It was clear that values of A l > 0 and A 2 > 0 lead to higher sum of squares, and do not 
support augmenting the identified IMA(0, 1,1) model in these directions. This implies, in 
particular, that a quadratic forecast function would give worse instead of better forecasts 
than those obtained from (8.1.1), as was indeed shown to be the case in Section A5.3.3. 

Computations in R. Estimation of the parameters in the more elaborate IMA(0, 3, 3) 
models for the IBM series using R also shows that the model can be simplified. The 
relevant commands along with a partial model output are provided below: 

>library(astsa} 

>ibm2=read.table("ibm2.txt",header=TRUE} 

>ibm.ts=ts(ibm2) 

>sarima(ibm.ts,0,3,3) 

Coefficients : 

mal ma2 ma3 

-2.0215 1.0686 -0.0469 

s.e. 0.0705 0.1370 0.0692 sigma"'2 estimated as 25.5 

> polyroot(c(1,-2.0215,1.0686,-0.0469)) 

1.013484 + 0.005832i 1.013484 -0.005832i 20.757680 + 0.OOOOOOi 
>sarima(ibm.ts,0,1,1) 


Coefficients : 
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mal constant 
-0.0848 0.3028 

s.e. 0.0634 0.2878 sigma~2 estimated as 25.1 

We note that the parameter estimates 9 1 and 0 2 in the IMA(0, 3, 3) model are highly sig¬ 
nificant. However, the large estimates are introduced as compensation for overdifferencing 
by setting d = 3 in this model. This is confirmed by finding the roots of the moving average 
polynomial using the command polyroot() in R. The results, which are included above, 
show that two of the roots are very close to one. Hence, cancellation is possible, reducing 
the IMA(0, 3, 3) model to a IMA(0, 1, 1) model. The IMA(0, 1, 1) model also provides a 
slightly better fit to the data as can be seen from the smaller value of <r 2 in the R output for 
this model. 


8.2 DIAGNOSTIC CHECKS APPLIED TO RESIDUALS 

The method of overfitting, by extending the model in a particular direction, assumes that 
we know what kind of discrepancies are to be feared. Procedures less dependent upon such 
knowledge are based on the analysis of residuals. It cannot be too strongly emphasized that 
visual inspection of a plot of the residuals themselves is an indispensable first step in the 
checking process. 

8.2.1 Autocorrelation Check 

Suppose that a model <p(B)w t = 9(B)a t has been fitted to the observed time series with 
ML estimates (cp. 0) obtained for the parameters. The quantities 

a, = 9~\B)4>{B)w, (8.2.1) 

are then referred to as the residuals. The residuals are computed recursively from 9(B)a t = 
<p(B)ih t as 

p « 

a t = w t — ^ ^ t = \,2,... ,n 

j= 1 7=1 

using either zero initial values (conditional method) or back-forecasted initial values (exact 
method) for the initial af s and wf s. Now, it is possible to show that, if the model is 
adequate, 

a, = a, + O 

As the series length increases, the af s become close to the white noise af s. Therefore, 
one might expect that study of the af s could indicate the existence and nature of model 
inadequacy. In particular, recognizable patterns in the estimated autocorrelation function 
of the af s could point to appropriate modifications in the model. This point is discussed 
further in Section 8.3. 

Now, suppose that the form of the model was correct and that we knew the true parameter 
values (p and 6. Then, using (2.1.13) and a result of Anderson (1942), the estimated 
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autocorrelations r k (a), of the a t ’s, would be uncorrelated and distributed approximately 
normally about zero with variance n _1 , and hence with a standard error of n -1 / 2 . We could 
use these facts to assess approximately the statistical significance of apparent departures of 
these autocorrelations from zero. 

Now, in practice, we do not know the true parameter values. We have only the esti¬ 
mates (<p, 0), from which, using (8.2.1), we can calculate not the a t ’s but the a t ’ s. The 
autocorrelations r k (a) of the a,’s can yield valuable evidence concerning lack of fit and 
the possible nature of model inadequacy. However, it was pointed out by Durbin (1970) 
that it might be dangerous to assess the statistical significance of apparent discrepancies 
of these autocorrelations r k (a) from their theoretical zero values on the basis of a standard 
error n -1 / 2 , appropriate to the r k (a)’s. Durbin was able to show, for example, that for the 
AR( 1) process with parameter </>, the variance of /q (a) is (jrn ~ l , which can be substantially 
smaller than w _1 . The large-sample variances and covariances for all the autocorrelations 
of the a,'s from any ARMA process were subsequently derived by Box and Pierce (1970). 
They showed that while in all cases, a reduction in variance can occur for low lags, and that 
at these low lags the r k (a )’s can be highly correlated, these effects usually disappear rather 
quickly at high lags. Thus, the use of n -1 / 2 as the standard error for r k (a) would underes¬ 
timate the statistical significance of apparent departures from zero of the autocorrelations 
at low lags but could usually be employed for moderate or high lags. 

For illustration, the large-sample one- and two-standard-error limits of the residual 
autocorrelations r k (d)’s , for two AR(l) processes and two AR(2) processes, are shown in 
Figure 8.1. These also supply the corresponding approximate standard errors for moving 
average processes with the same parameters as indicated in the figure. It is evident that, 
except at moderately high lags, n -1 / 2 provides an upper bound for the standard errors of 
the r k (d)’s rather than the standard errors themselves. If for low lags we use the standard 

k -> k -* 

2 : 

s/n 

1 

Vri 

0 

1 

Vri 

2 

v« 

2 

Vri 

1 

Vri 

0 

1 

Vri 

2 

Vri 

(c) (d) 


12 3 4 5 6 




AR(1 ).4> = 0.3 
MA (1),0 = 0.3 
(a) 



Vri 


AR (!),<)> = 0.7 
MA(1),0=O.7 
(b) 



AR (2), ()>, = 
MA (2), 6 1 = 


Vri 

± 

Vri 


V>> 

2 


0.5, <j> 2 = 0.25 
0.5, 0 2 = O.25 


H-1-f- 


Vri AR (2), <£, = 1.0, 4>2 : 
MA (2).0, = 1.0, e 2 -- 


-0.75 

-0.75 


FIGURE 8.1 Standard-error limits for residual autocorrelations r k (a). 
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error n '/ 2 for the r^(a)’s, we may seriously underestimate the significance of apparent 
discrepancies. 

8.2.2 Portmanteau Lack-of-Fit Test 

In addition to considering the r k (a)’s individually, an indication is often needed of whether, 
say, the first 10-20 autocorrelations of the a t ’s taken as a whole indicate inadequacy of the 
model. Suppose that we have the first K autocorrelations 2 r k (a) (k = 1,2,.... K) from any 
ARIMA(p, d. q) model, then it is possible to show (Box and Pierce, 1970) that if the fitted 
model is appropriate, 

K 

Q = n £ r 2 (a) (8.2.2) 

k=\ 

is approximately distributed as / 2 {K — p— q), where/; = N - d is the number of w’s used 
to fit the model. On the other hand, if the model is inappropriate, the average values of Q 
will be inflated. Therefore, an approximate ‘ ‘portmanteau’ ’ test of the hypothesis of model 
adequacy, designed to take account of the difficulties discussed above, may be made by 
referring an observed value of Q to the percentage points of this / 2 distribution. 

However, Ljung and Box (1978) later showed that, for sample sizes common in practice, 
the chi-squared distribution may not provide an adequate approximation to the distribution 
of the statistic Q under the null hypothesis, with the values of Q tending to be somewhat 
smaller than what is expected under the chi-squared distribution. Empirical evidence to 
support this was also presented by Davies et al. (1977). Ljung and Box (1978) proposed a 
modified form of the statistic, 

K 

Q = n(n + 2) £(n - ky l r 2 k {a) (8.2.3) 

k= 1 

such that the modified statistic has, approximately, the mean E[Q] « K — p — q of the 
Z 2 (K — p — q) distribution. The motivation for (8.2.3) is that a more accurate value for 
the variance of r k {a) from a white noise series is (n — k)/n(n + 2), rather than 1 /n used in 
(8.2.2). This modified form of the portmanteau test statistic has been recommended for use 
as having a null distribution that is much closer to the / 2 (K — p — q) distribution for typical 
sample sizes n. Because of its computationally convenient form, this statistics has been 
implemented in many software packages and has become widely used in applied work. 
We emphasize, however, that this statistic should not be used as a substitute for careful 
examination of the residuals and their individual autocorrelation coefficients, and for other 
diagnostic checks on the fitted model. 

Remark. Diagnostic checks based on the residuals and their autocorrelation coefficients 
are conveniently performed using R. Having fitted a model ml to the observed series, the 
command tsdiag(m1 $residuals, gof.lag=20) provides a plot of the standardized residuals, 
a plot of the first 20 residual autocorrelation coefficients, and a plot of the /(-values for the 


2 It is assumed here that K is taken sufficiently large so that the weights i/y y in the model, written in the form 
w t = 0 1 (B)9(B)a t = i/A B)a t will be negligibly small after j = K. 
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portmanteau statistic Q for increasing values of K. However, while these diagnostics are 
useful, it appears that the command tsdiag(), at present, determines p-values for Q using 
a chi-square distribution with K rather than K — p — q degrees of freedom. An alternative 
is to use diagnostic tools in the R package astsa, where this problem does not appear. An 
illustration of the use of this package is provided below. 

An Empirical Example. In Chapter 7, we examined two potential models for a time series 
of chemical temperature readings referred to as Series C. The two models were (1) the 
IMA(0, 2, 2) model V 2 z, = (1 - 0.135 - 0.12 B 2 )a, and (2) the ARIMA(1, 1, 0) model 
(1 —0.825)Vz, = a r It was decided that the second model gave a preferable representation 
of the series. Model diagnostics for the IMA(0, 2, 2) model generated using R are provided 
in Figure 8.2. These include graphs of the standardized residuals, the residual autocor¬ 
relation coefficients r(a k ). for lags k = 1,..., 25, a normal Q-Q plot of the standardized 
residuals, and a plot of the p-values for the portmanteau statistic Q in (8.2.3) determined 
for increasing values of K. The graph of the standardized residuals reveals some large 
residuals around t = 60, but apart from that there are no issues. The Q-Q plot confirms 
the presence of three large residuals but indicates that the normal approximation is adequate 
otherwise. 

Approximate two-standard-error upper bounds on the residual autocorrelation coeffi¬ 
cients are included in the graph of the autocorrelation function. Since there are n = 224 
observations after differencing the series, the approximate upper bound for the standard 


Standardized residuals 





Theoretical quantiles 


p values for Ljung-Box statistic 



FIGURE 8.2 Model diagnostics for the ARIMA(0, 2, 2) model fitted to the temperature readings 
in Series C. 
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error of a single autocorrelation is 1/ \fT2A « 0.07. While most of the individual autocor¬ 
relations fall within the two-standard-error bounds, several values including /‘ 3 (d), /‘ 9 (d), 
i'u(a), /' 17 (a), /‘ 22 (d), and /‘ 25 (d) are close to these bounds. Of course, occasional large devi¬ 
ations occur even in random series, but taking these results as a whole, there is a suspicion 
of some lack of fit. This is confirmed by examining the //-values of the portmanteau statistic 
shown in the bottom graph of Figure 8.2. We note that most of the //-values are at or near 
the 5% level indicating some lack of fit. This is especially the case for the larger values of 
K , where the chi-squared distribution is expected to provide a valid approximation. 

Model diagnostics for the ARIMA(1, 1, 0) model (1 — 0.82 B)Vz, = a t fitted to the 
same time series are displayed in Figure 8.3. The graph of the residual autocorrelation 
function shows fewer large values for this model. This is also reflected in the //-values of 
the portmanteau statistic shown at the bottom of the graph. These diagnostic checks show 
a clear improvement over the IMA(0, 2, 2) model examined in Figure 8.2. The graph of 
the standardized residuals and the normal Q-Q plot reveal that outliers are still present, 
however. Methods for outlier detection and adjustments will be discussed in Section 13.2, 
where the ARIMA(1, 1, 0) model for Series C is refitted allowing the outliers at t = 58, 59, 
and 60. Allowing these outliers in the parameter estimation changes the estimate </> only 
slightly from 0.82 to 0.85. However, a larger change occurs in the estimate of the residual 
variance, which is reduced by about 26% when the outliers are accounted for in the model. 

Before proceeding, we note that Figures 8.2 and 8.3 can be reproduced in R using the 
following commands: 


2 


Standardized residuals 



p values for Ljung-Box statistic 


<D 

CO 

> 

Q. 



FIGURE 8.3 Model diagnostics for the ARIMA(1, 1, 0) model fitted to the temperature readings 
in Series C. 
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>library(astsa) 

>seriesC=read.table("seriesC.txt",header=T) 

>sarima(seriesC,0,2,2,no.constant=TRUE) % Figure 8.2 
>sarima(seriesC,1,1,0,no.constant=TRUE) % Figure 8.3 

Portmanteau Tests for Series A-F. Table 8.1 summarizes the values of the criterion Q in 
(8.2.3) based on K = 25 residual autocorrelations for the models fitted to Series A-F in 
Table 7.11. However, in regards to the choice of K, a somewhat smaller value would be 
recommended for use in practice, especially for shorter series such as Series E and F, since 
the asymptotic theory involved in the distribution of the statistic Q relies on K growing 
(but only slowly, such that K/n —*■ 0) as the series length n increases. In addition, as noted 
by Ljung (1986), smaller values of K also have advantages in terms of increased power. 
This is particularly true for nonseasonal series, where the lack of fit is expected to be most 
evident in residual autocorrelations at the first few lags. 

Inspection of Table 8.1 shows that only two suspiciously large values of Q occur. 
One is the value Q = 36.2 obtained after fitting the IMA(0, 2, 2) model to Series C, 
which we have discussed already. The other is the value Q = 38.8 obtained after fitting an 
IMA(0, 1, 1) model to Series B. This suggests some model inadequacy since the 5 and 
2.5% points for / 2 with 24 degrees of freedom are 36.4 and 39.3, respectively. The nature 
of possible model inadequacy for Series B will be examined further in Section 8.2.3. 

Other Portmanteau Statistics to Test Model Adequacy. Instead of a portmanteau statistic 
based on residual autocorrelations, as in (8.2.3), one could alternatively consider a test for 
model adequacy based on residual partial autocorrelations. If the model fitted is adequate, 
the associated error process a t is white noise and one should expect the residual partial 
autocorrelation at any lag k, which we denote as <p kk (a), not to be significantly different 
from zero. Therefore, a test for model adequacy can be based on the statistic 

K 

Q* = n{n + 2)Y J (n-kT l 4> 2 kk {a) (8.2.4) 

k =1 


TABLE 8.1 Summary of Results of Portmanteau Test Applied to Residuals of Various Models 
Fitted to Series A-F 


Series 

n = 

N-d 

Fitted Model 

Q 

Degrees 

of 

Freedom 

A 

197 

z, - 0.92z,_, = 1.45 + a, - 0.58a, 

28.4 

23 


196 

Vz, = a, - 0.70a, 

31.9 

24 

B 

368 

Vz, = a, + 0.09a, 

38.8 

24 

C 

225 

Vz, - 0.82Vz,_, = a, 

31.3 

24 


224 

V 2 z, = a, - 0.13a,_j - 0.12a,_ 2 

36.2 

23 

D 

310 

z, - 0.87z,_, = 1.17 + a, 

11.5 

24 


309 

Vz, = a, - 0.06a, 

18.8 

24 

E 

100 

z, - 1.42z,_, + 0.73z,_ 2 = 14.35 + a, 

26.8 

23 


100 

z, - 1.57z,_| + 1.02 z,_ 2 - 0.21z,_ 3 = 11.31 + a, 

20.0 

22 

F 

70 

z, +0.34z,_, - 0.19 z,_ 2 = 58.87 + a, 

14.7 

23 
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Under the hypothesis of model adequacy, Monti (1994) argued that the statistic Q* in (8.2.4) 
is asymptotically distributed as / 2 (K — p — q), analogous to the asymptotic distribution of 
the statistic Q in (8.2.3). Hence, a test of model adequacy can be based on referring the value 
of Q* to the upper critical value determined from this distribution. The test based on Q* 
has been found to be typically at least as powerful as Q in detecting departures from model 
adequacy, and it seems to be particularly sensitive when the alternative model includes a 
higher order moving average term. In practice, since residual partial autocorrelations are 
routinely available, we could consider using both the statistic Q in (8.2.3) and Q* in (8.2.4) 
simultaneously in standard model checking procedures. 

Another portmanteau goodness-of-fit test statistic based on a general measure of mul¬ 
tivariate dependence was proposed by Pena and Rodriguez (2002). Denote the correlation 
matrix up to order (lag) K of the residuals d, from the fitted ARIMA(p, d, q ) model by 


1 

r\{a) 

r 2 (d) .. 

r K (a) 

(a) 

1 


r K- 1 («) 

r 2 (d) 

r\(d) 

1 

C4 .. 
1 

* 

r K {a) 

r K- 1 («) 

C4 

1 

* 

1 


The proposed statistic is based on the determinant of this correlation matrix, a general 
measure of dependence in multivariate analysis, and is given by 

D K =n(l-\P K (a)\ l ' K ) (8.2.5) 

An alternate interpretation for the statistic is obtained from the following relation given by 
Pena and Rodriquez (2002) 


\p k w\ i/k = ri [i - $ik^ (K+i ~ k)/K 

k= 1 

where the 4> kk (d ) are the residual partial autocorrelations as in (8.2.4). This expression 
shows that |P i; (o)| is also a weighted function of the first K partial autocorrelations of 
the residuals. However, in comparison to the statistics (8.2.3) and (8.2.4), relatively more 
weight is given to the lower lag residual correlations in the statistic (8.2.5). The asymptotic 
distribution of D K is shown to be a linear combination of ^-independent / 2 (1) random 
variates, which can be approximated by a gamma distribution (see Pena and Rodriguez, 
2002). The authors also proposed and recommended a modification of the statistic D K , 
here denoted as D K , in which the residual autocorrelations r k (a) used to form I\(d) are 
replaced by the modified values \f(n + 2 )/(n — k)r k (a), similar to the modifications used 
in the Q and Q* statistics. Simulation evidence indicates that the statistic D K may provide 
considerable increase in power over the statistics Q and Q* in many cases, due to its greater 
sensitivity to the lower lag residual correlations. Application of this procedure to detection 
of several types of nonlinearity, by using sample autocorrelations of squared residuals a 2 , 
was also explored in Pena and Rodriguez (2002). (For discussion of nonlinearities, see 
Sections 10.2 and 10.3). 

Pena and Rodriguez (2006) proposed a modification of their earlier test that 
has the same asymptotic distribution as D K but better performance in finite sam- 
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pies. The modified test statistics has the form D* K = —n J]f =1 w k \n[\ — </>j: k (a)], where 
w k = (K + 1 — k)/(K + 1). The statistic is thus proportional to a weighted average of 
the squared partial autocorrelation coefficients with larger weights given to low-order 
coefficients and smaller weights to high-order coefficients. The authors considered two 
approximations to the asymptotic distribution of this statistic, and demonstrated using sim¬ 
ulation that the test performs well. Several other authors have extended the work of Pena 
and Rodriguez (2002) and proposed portmanteau statistics that are asymptotically similar 
to their statistics; for a discussion and references, see Fisher and Gallagher (2012). See also 
Li (2004) for a more detailed discussion of diagnostic testing. 


8.2.3 Model Inadequacy Arising from Changes in Parameter Values 

Another form of model inadequacy occurs when th e form of the model remains the same 
but the parameters change over a prolonged period of time. In fact, it appears that this can 
explain the possible inadequacy of the (0, 1, 1) model fitted to the IBM data. 

Table 8.2 shows the results obtained by fitting (0, 1, 1) models separately to the first 
and second halves of Series B as well as to the complete series. Denoting the estimates of 
A = l —6 obtained from the two halves by A | and A 2 , we find that the standard error of 
2| — A 2 is \J (0.070) 2 + (0.074) 2 = 0.102. Since the difference A\ — A 2 = 0.26 is 2.6 times 
its standard error, it is likely that a real change in A has occurred. Inspection of the Q values 
suggests that the ( 0 , 1 , 1 ) model, with parameters appropriately modified for different 
time periods, might explain the series more exactly. The estimation results for the residual 
variances a 2 a also strongly indicate that a real change in variability has occurred between 
the two halves of the series. 

This is confirmed by Figure 8.4 that shows the standardized residuals and other model 
diagnostics for the IMA(0, 1,1) model fitted to Series B. An increase in the standardized 
residuals around time t = 236 indicates a change in the characteristics of the series around 
that time. In fact, fitting the IMA(0, 1,1) model separately to the first 235 observations 
and to the remaining 134 observations yields the estimates 9 j = —0.26, = 24.55, and 

0 2 = —0.02, < 7 ^ = 99.49, respectively. Hence, a substantial increase in variability during 
the latter portion of the series is clearly indicated. Additional approaches to explain and 
account for inadequacy in the overall IMA(0, 1,1) model for Series B, which include al¬ 
lowance for conditional heteroscedasticity in the noise, nonlinearity, and mixture transition 
distributions, have been discussed by Tong (1990) and Le et al. (1996), among others. 
Some of these modeling approaches will be surveyed in general in Chapter 10. 


TABLE 8.2 Comparison of IMA(0,1,1) Models Fitted to First and Second Halves of Series B 



n 

9 

<^b 

1 

II 

d(X) = 

|-1(2-1) j i /2 

n 

Residual 

Variance 

<7 2 

a 

Q 

Degrees 

of 

Freedom 

First half 

184 


1.29 

±0.070 

26.3 

24.6 

24 

Second half 

183 


1.03 

±0.074 

77.3 

37.1 

24 

Complete 

368 



±0.052 

52.2 

38.8 

24 
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Standardized residuals 



Lag 


p values for Ljung-Box statistic 


Theoretical quantiles 



FIGURE 8.4 Model diagnostics for the IMA(0, 1, 1) model fitted to the IBM daily closing stock 
prices in Series B. 


8.2.4 Score Tests for Model Checking 

An alternative to the direct use of overfitting in model checking is provided by the Lagrange 
multiplier or score test procedure, which is also closely related to the portmanteau test 
procedure. The general score test procedure was presented by Silvey (1959), and its use 
in diagnostic checking for ARIMA models was discussed initially by Godfrey (1979) and 
Poskitt and Tremayne (1980). A computational advantage of the score test procedure is that 
it requires maximum likelihood estimation of parameters only under the null model under 
test, but it yields tests asymptotically equivalent to the corresponding likelihood ratio tests 
obtained by directly overfitting the model. Furthermore, the score test statistic is easily 
computed in the form of the sample size n times a coefficient of determination from a 
particular “auxiliary” regression. 

Hence, we assume that an ARMA(/>, q) model has been fitted by the maximum likelihood 
method to the observations w t , and we want to assess the adequacy of the model by testing 
this null model against the alternative of an ARMA(p + /•, q) model or of an ARMA(p, q+ r) 
model. That is, for the ARMA(p + r, q) alternative, we test H 0 : 4> p+ \ = ••• = 4> p+r = 0, 
while for the ARMA(p, q + r) alternative, we test H 0 : 6 q+l = ■■■ = 0 +r = 0. The score 
test procedure is based on the first partial derivatives, or scores, of the log-likelihood 
function with respect to the model parameters of the alternative model, but evaluated at 
the ML estimates obtained under the null model. The log-likelihood function is essentially 
given by / = —(«/2)ln(<rj) - (^rr~ 2 ) Y!t=i a r So, the partial derivatives of / with respect 
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to the parameters ($, 0) are 


dl _ _ 1 ■'Ti da, 
dl _ 1 V* ^ a t 

iM: = _ ^2 L ~Sf Q < 

J a t= 1 J 


As in (7.2.9) and (7.2.10), we have 

da, 

d< $>j 


**-j 


da, 

"df, 


u t-j 


where u, = 0 l (B)w, = </> l (B)a ,, and v, 
ML fitting of the null model, as 


6 1 (B)a,. Given residuals a,, obtained from 


p t 

a t = u>t~ t= 

j =1 7=1 

the u,’s and v t ’s evaluated under the ML estimates of the null model can be calculated 
recursively, starting with initial values set equal to zero, for example, as 

u, = w, + 9 x u,_ x + —h 9 q ii,_ q 

v t = -d t + e ic,-i + - + e q v,_ q 

The score vector of first partial derivatives with respect to all the model parameters P 
can be expressed as 

Is = 4 x ' a (8.2.6) 

Op G 1 

r a 

where a = (a, a n )' and X denotes the n X (p + q + r) matrix whose fth row consists of 
(«,_[,..., u,_ p _ r , v,_ |,..., v,_ q ) in the case of the ARMA(p + /•, q) alternative model and 
(u,_ i,..., u t _ p , v,_ j,..., v r _ 9 _,.) in the case of the ARMA( p,q + r) alternative model. Then, 
similar to (7.2.17), since the large-sample information matrix for p can be consistently 
estimated by <r“ 2 X'X, where <r 2 = rT x ]£" =1 a 2 = rT l a.'a, it follows that the score test 
statistic for testing that the additional r parameters are equal to zero is 

A = a 'X(X'Xr'X'a (g27) 

*2 

Godfrey (1979) noted that the computation of the test statistic in (8.2.7) can be given the 
interpretation as being equal to n times the coefficient of determination in an auxiliary 
regression equation. That is, if the alternative model is ARM At/; + /•, q), we consider the 
auxiliary regression equation 


a t - +- b a p+r ll,_ p _ r + /?iG-l +-b Pq V t-q + £ t 

while if the alternative model is ARMA(p, q + /•), we consider the regression equation 
a, = a 1 u,_ l +-1- a p u,_ p + PxV,_ x +-h P q+r v,_ q _ r + e, 



DIAGNOSTIC CHECKS APPLIED TO RESIDUALS 297 


Let e t denote the residuals from the ordinary least-squares estimation of this regression 
equation. Then from (8.2.7), it is seen that A can be expressed, essentially, as 


A : 




n I 1 


V" £■ 
^-n= I e r 

2,=i a, 


which is n times the coefficient of determination of the regression of the a ,’s on the u t _j ’s 
and the v t _j’ s. Under the null hypothesis that the fitted ARMA(p, q ) model is correct, the 
statistic A has an asymptotic % 2 distribution with /• degrees of freedom, and the null model 
is rejected as inadequate for large values of A. 

As argued by Godfrey (1979) and others, rejection of the null model by the score test 
procedure should not be taken as evidence to adopt the specific alternative model involved, 
but simply as evidence against the adequacy of the fitted model. Similarly, the score test 
is expected to have reasonable power even when the alternative model is not correctly 
specified. Poskitt and Tremayne (1980) showed, for example, that the score test against 
an ARMA(p + /-, q ) model alternative is asymptotically identical to a test against an 
ARMA(p, q + r) alternative. Hence, the score test procedure may not be sensitive to the 
particular' model specified under the alternative, but its performance will, of course, depend 
on the choice of the number/- of additional parameters specified. 

We also note an alternative form for the score statistic A. By the ML estimation 
procedure, it follows that the first partial derivatives, dl/dcpp j = 1,... ,p, and dl/ddj, 
j = 1,... ,q, will be identically equal to zero when evaluated at the ML estimates. Hence, 
the score vector, dl/dfi. will contain only r nonzero elements when evaluated at the ML 
estimates from the null model, these being the partial derivatives with respect to the addi¬ 
tional /• parameters of the alternative model. Thus, the score statistic in (8.2.7) can also be 
viewed as a quadratic form in these r nonzero values, whose matrix in the quadratic form 
is a consistent estimate of the inverse of the covariance matrix of these r score values when 
evaluated at the ML estimates obtained under the null model. Since these r score values are 
asymptotically normal with zero means under the null model, the validity of the asymptotic 
X 2 (r) distribution under the null hypothesis is easily seen. 

Newbold (1980) noted that a score test against the alternative of r additional parameters 
is closely related to an appropriate test statistic based on the first r residual autocorrelations 
r k (a) from the fitted model. The test statistic is essentially a quadratic form in these first 
r residual autocorrelations, but of a more complex form than the portmanteau statistic 
in (8.2.2). As a direct illustration, suppose that the fitted or null model is a pure AR(p) 
model, and the alternative is an ARMA(p, /•) model. Then, it follows from above that the 
variables v t _j are identical to —a t _j, since 9(B) = 1 under the null model. Hence, the 
nonzero elements of the score vector in (8.2.6) are equal to — n times the first r residual 
autocorrelations, /q (a),.... r r (a) from the fitted model, and the score test is thus directly 
seen to be a quadratic form in these first r residual autocorrelations. 


8.2.5 Cumulative Periodogram Check 

In some situations, particularly in the fitting of seasonal time series, which are discussed 
in Chapter 9, it may be feared that we have not adequately taken into account the periodic 
characteristics of the series. Therefore, we are on the lookout for periodicities in the 
residuals. The autocorrelation function will not be a sensitive indicator of such departures 
from randomness because periodic effects will typically dilute themselves among several 
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autocorrelations. The periodogram, on the other hand, is specifically designed for the 
detection of periodic patterns in a background of white noise. 

The periodogram of a time series a,,t = 1,2,as defined in Section 2.2.1, is 



2 a t cos(2tt fft) j + ( ^ a, sin(27r/,.f) 


0=1 


,r=l 


( 8 . 2 . 8 ) 


where f t = i/n is the frequency. Thus, it is a device for correlating the a/s with sine and 
cosine waves of different frequencies. A pattern with given frequency / ( in the residuals 
is reinforced when correlated with a sine or cosine wave at that same frequency, and so 
produces a large value of /(/,). 


Cumulative Periodogram. Bartlett (1955) and other authors have shown that the cumula¬ 
tive periodogram provides an effective means for the detection of periodic nonrandomness. 

The power spectrum p(f) for white noise has a constant value 2o 2 a over the frequency 
domain 0-0.5 cycle. Consequently, the cumulative spectrum for white noise 

P(f)= [ f P(g)dg (8.2.9) 

Jo 

plotted against / is a straight-line running from (0, 0) to (0.5, erj), that is, P(f)/o 2 is a 
straight-line running from (0, 0) to (0.5, 1). 

The periodogram 1(f) provides an estimate of the power spectrum at frequency /. In 
fact, for white noise, £[/(/)] = 2o 2 and hence the estimate is unbiased. It follows that 
(1 /n) Y/j_i / (/,) provides an unbiased estimate of the integrated spectrum P(fj), and 

1U l( fJ 

= - (8.2.10) 

ns z 

an estimate of P(fj)/o 2 , where s 2 is an estimate of o 2 . We will refer to C(fj) as the 
normalized cumulative periodogram. 

Now, if the model was adequate and the parameters known exactly, the a/s could be 
computed from the data and would yield a white noise series. For a white noise series, the 
plot of C(fj) against fj would be scattered about a straight-line joining the points (0,0) 
and (0.5,1). On the other hand, model inadequacies would produce nonrandom a,’ s, whose 
cumulative periodogram could show systematic deviations from this line. In particular, 
periodicities in the a/s would tend to produce a series of neighboring values of /(// that 
were large. These large ordinates would reinforce each other in C(fj) and form a bump on 
the expected straight line. 

In practice, we do not know the exact values of the parameters, but only their es¬ 
timated values. Hence, we do not have the a/s, but only the estimated residuals a/s. 
However, for large samples, the periodogram for the a/s will have similar properties to 
that for the a/s. Thus, careful inspection of the periodogram of the a/s can provide a 
useful additional diagnostic check, particularly for indicating periodicities taken account of 
inadequately. 
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Example: Series C. We have seen that Series C is well fitted by the (1,1,0) model: 

(1 - 0.82-B)Vz ( = a t 

and somewhat less well by the IMA(0,2,2) model: 

V 2 z, = (1 -Q.\3B-0A2B 2 )a, 

which is rather similar to it. We illustrate the cumulative periodogram test by showing 
what happens when we analyze the residual a’s after fitting to the series an inadequate 
IMA(0,1,1) model: 


Vz; = (1 — 0B)a t 

where the least squares estimate of 6 is found to be —0.65. The normalized cumulative 
periodogram plot of the residuals from this model is shown in Figure 8.5(a). We see im¬ 
mediately that there are marked departures from linearity in the cumulative periodogram. 
These departures are very pronounced at low frequencies, as might be expected, for ex¬ 
ample, if the degree of differencing is insufficient. Figure 8.5(b) shows the corresponding 
plot for the best-fitting IMA(0, 2, 2) model. The points of the cumulative periodogram now 
cluster more closely about the expected line, although, as we have seen in Table 8.1 and 
Figure 8.2, other evidence points to the inadequacy of this model. 

It is wise to indicate on the diagram the period as well as the frequency. This makes 
for easy identification of the bumps that occur when residuals contain periodicities. For 
example, in monthly sales data, bumps near periods 12, 24, 36, and so on might indicate 
that seasonal effects were accounted for inadequately. 

The probability relationship between the cumulative periodogram and the integrated 
spectrum is precisely the same as that between the empirical cumulative frequency func¬ 
tion and the cumulative distribution function. For this reason we can assess deviations 
of the periodogram from that expected if the a t ’s were white noise, by use of the 
Kolmogorov-Smirnov test. Using this test, we can place limit lines about the theoreti¬ 
cal line. The limit lines are such that if the a t series were white noise, the cumulative 
periodogram would deviate from the straight line sufficiently to cross these limits only 
with the stated probability. Now, because the a t ’s are fitted values and not the true a t ’ s, 
we know that even when the model is correct, they will not precisely follow a white noise 
process. Thus, as a test for model inadequacy, application of the Kolmogorov-Smirnov 
limits will indicate only approximate probabilities. However, it is worthwhile to show these 
limits on the cumulative periodogram to provide a rough guide as to what deviations to 
regard with skepticism and what to take more note of. 

The limit lines are such that for a truly random or white noise series, they would be 
crossed a proportion e of the time. They are drawn at distances ±K e / \JT) above and below 
the theoretical line, where q = (n — 2)/2 for n even and (n — l)/2 for n odd. Approximate 
values for K E are given in Table 8.3. 


TABLE 8.3 Coefficients for Calculating Approximate Probability Limits for Cumulative Pe¬ 
riodogram Test 


e 

K, 


0.01 

1.63 


0.05 

1.36 


0.10 

1.22 


0.25 

1.02 
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FIGURE 8.5 Series C: cumulative periodograms of residuals from best-fitting models (a) of order 
(0, 1, 1) and (b) of order (0, 2, 2). 


For Series C, q = (224 — 2)/2 = 111, and the 5% limit lines inserted on Figure 8.5 
deviate from the theoretical line by amounts ±1.36/\/111 = ±0.13. Similarly, the 25% 
limit lines deviate by ±1.02/\/l 11 = ±0.10. 

Conclusions. Each of the model checking procedures described above has essential ad¬ 
vantages and disadvantages. Checks based on the study of the estimated autocorrelation 
function and the cumulative periodogram, although they can point out unsuspected pecu¬ 
liarities of the series, may not be particularly sensitive. Tests for specific departures by 
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overfitting are more sensitive but may fail to warn of trouble other than that specifically 
anticipated. Portmanteau tests based on the residual autocorrelation and partial autocorre¬ 
lations, while not always sensitive, provide convenient summary measures that are easy to 
use. As a result, they are now available in many software packages. 


8.3 USE OF RESIDUALS TO MODIFY THE MODEL 

8.3.1 Nature of the Correlations in the Residuals When an Incorrect Model Is Used 

When the autocorrelation function of the residuals from a fitted model indicates that the 
model is inadequate, it is necessary to consider in what way the model should be modified. 
In Section 8.3.2, we show how the autocorrelations of the residuals can be used to suggest 
such modifications. As an introduction, we consider the effect of fitting an incorrect model 
on the autocorrelation function of the residuals. 

Suppose that the correct model is 


<p(B)w t = 9(B)a t 


but that an incorrect model 


4>o(B)w t = 9 0 (B)b, 

is used. Then the residuals b t , in the incorrect model, will be correlated and since 

b, = 0-\BMB)4> o (B)<j>-\B)a t (8.3.1) 

the autocovariance generating function of the b t ’s will be 

al[0-\B)0-\F)0{B)0{F) ( t> 0 (B) ( t> 0 (F) ( l>-\B)ct>-\F )] (8.3.2) 

For example, suppose that in an IMA(0, 1,1) process, instead of the correct value 9, we 
use some other value 9 0 . Then the residuals b t would follow the mixed process of order 
( 1 , 0 , 1 ): 


(1 - 9 0 B)b t = (1 - 9B)a, 


and using (3.4.8), we have 

= (1 - 99 0 )(9 0 - 9) 

Pl 1 +9 2 - 299 q 
P j = PiOo J ~ 1 J = 2,3,... 
For example, suppose that in the IMA(0, 1,1) process, 

Vz r = (1 — 9B)a t 

we took 9 q = 0.8 when the correct value was 9 = 0. Then 

9 0 = 0.8 9 = 0.0 

Pj = 0.8 pj = 0.& 
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Thus, the b t ’s would be highly autocorrelated and, since(l —0.8 B)b, = V z, = a t , b, would 
follow the autoregressive process 


(1 - 0.8 B)b, = a, 


8.3.2 Use of Residuals to Modify the Model 

Suppose that the residuals b t from the model 


$ Q (B)V d °z, = 6 0 (B)b, 


(8.3.3) 


appear to be nonrandom, that is, to deviate from white noise behavior. Using the auto¬ 
correlation function of b ,, the methods of Chapter 6 may now be applied to identify a 
model: 


4>itm dl b, = 0itB)a, 


(8.3.4) 


for the b, series.On eliminating b t between (8.3.3) and (8.3.4), we arrive at a new model: 


0 o (5)0 1 (5)V‘ , <>V‘ , iz, = 0 o (B)0 l (B)a t (8.3.5) 

which can now be fitted and diagnostically checked. 

For example, suppose that a series had been wrongly identified as an IMA(0, 1, 1) 
process and fitted to give the model: 


Vz, = (1 + 0.6 B)b, 


Also, suppose that a model 


S7b t = (1 + 0.8 B)a t 


(8.3.6) 


(8.3.7) 


was identified for this residual series. Then on eliminating b t between (8.3.6) and (8.3.7), 
we would obtain 


V 2 z, = (1 + 0.6 B)Vb, 

= (1+0.65X1-0.85)0, 
= (1 - 0.2 B - 0.48 B 2 )a, 


which would suggest that an 1MA(0, 2, 2) process should now be entertained. 
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EXERCISES 

8.1. The following are the first 30 residuals obtained when a tentative model was fitted 
to a time series: 


t 



Residuals 



1-6 

0.78 

0.91 

0.45 

-0.78 

-1.90 

-2.10 

7-12 

-0.54 

-1.05 

0.68 

-3.77 

-1.40 

-1.77 

13-18 

1.18 

0.02 

1.29 

-1.30 

-6.20 

-1.89 

19-24 

0.95 

1.49 

1.08 

0.80 

2.02 

1.25 

25-30 

0.52 

2.31 

1.64 

0.78 

1.99 

1.36 


Plot the values and state any reservations you have concerning the adequacy of 
the model. 

8.2. The residuals from a model Vz ( = (1 — 0.6 B)a t fitted to a series of N = 82 observa¬ 
tions yielded the following residual autocorrelations: 


k 

r k(a) 

k 

r k (a) 

1 

0.39 

6 

-0.13 

2 

0.20 

7 

-0.05 

3 

0.09 

8 

0.06 

4 

0.04 

9 

0.11 

5 

0.09 

10 

0.02 


(a) Plot the residual ACF and determine whether there are any abnormal values 
relative to white noise behavior. 

(b) Calculate the chi-square statistic Q for lags up to K = 10 and check whether the 
residual autocorrelation function as a whole is indicative of model inadequacy. 

(c) What modified model would you now tentatively entertain, fit, and check? 

8.3. A long series containing N = 326 observations was split into two halves and a 
(1, 1,0) model (1 — <pB)S/z t = a t identified, fitted, and checked for each half. If the 
estimates of the parameter <p for the two halves are = 0.5 and 0 (2) = 0.7, is there 
any evidence that the parameter <p has changed? 

8.4. (a) Show that the variance of the sample mean z of n observations from a stationary 

AR(1) process (1 — 4>B)z, = a, is given by 


var[z] ~ 


n{ 1 — </>) 2 


(b) The yields from consecutive batches of a chemical process obtained under fairly 
uniform conditions of process control were shown to follow a stationary AR(1) 
process(l + 0.5 B)z t = a t . A technical innovation is made at a given point in time 
leading to 85 data points with mean Zj = 41.0 and residual variance s 2 1 = 0.1012 
before the innovation is made and 60 data points with z4 = 43.5 and s 2 = 0.0895 
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after the innovation. Is there any evidence that the innovation has improved 
(increased) the yield? 

8.5. Suppose that a (0, 1, 1) model Vz, = (1 — 6B)e t , corresponding to the use of an 
exponentially weighted moving average forecast, with 0 arbitrarily chosen to be 
equal to 0.5, was used to forecast a series that was, in fact, well represented by the 
(0, 1, 2) model Vz, = ( 1 - 0.95 + 0.2B 2 )a t . 

(a) Calculate the autocorrelation function of the lead 1 forecast errors e, obtained 
from the (0, 1,1) model. 

(b) Show how this ACF could be used to identify a model for the e, series, leading 
to the identification of a (0, 1, 2) model for the z, series. 

8.6. Two time series models, AR(2) and AR(3), were fitted to the yearly time series 
of sunspot numbers for the period 1770-1869 in Chapter 7. The sunspot data are 
available for the slightly longer time period 1700-1988 as series ‘sunspot.year’ in 
the datasets package in R; type help(sunspot.year) for details. Perform diagnostic 
checking to determine the adequacy of the AR(2) and AR(3) models for this longer 
time period. Are there alternative models that you would consider for this series? 
Would you recommend that a data transformation be used in this case? 

8.7. Monthly sales, {T r }, of a company over a period of 150 months are provided as part 
of Series M in Part 5 of this book. This series is also available as series BJsales 
along with a related series BJ sales.lead in the datasets package in R. 

(a) Plot the data and comment. 

(b) Perform a statistical analysis to determine a suitable model for this series. Esti¬ 
mate the parameters using the maximum likelihood method. 

(c) Repeat the analysis for the series of leading indicator BJ sales.lead that is part 
of the same dataset. 

(d) Perform diagnostic checking to determine if there is any lack of fit in the models 
selected for the two series? 

8.8 Global mean surface temperature deviations (from the 1951-1980 average level) are 
available for the period 1880-2009 as series ’gtemp2’ in the astsa package in R. 

(a) Plot the data and comment. Are there any unusual features worth noting? 

(b) Perform a statistical analysis to determine a suitable model for this series. Esti¬ 
mate the parameters using the maximum likelihood method. 

(c) Is there evidences of any lack of fit in the models selected for this series? 

(d) Can you suggest an alternative way to analyze this time series? How might an 
analysis of model generated forecasts impact your choice of model? 

8.9 Refer to the daily air quality measurements for New York, May to September 1973, 
analyzed in Problem 7.10 of Chapter 7. Perform diagnostic checks to determine the 
adequacy of the models fitted to average daily temperature and wind speed series. 

8.10 Repeat the analysis in Problem 8.9 by performing diagnostic checks on the model, 
or models, considered for the solar radiation series in Problem 7.11. 
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In Chapters 3-8, we have considered the properties of a class of linear stochastic models, 
which are of value in representing stationary and nonstationary time series, and we have 
seen how these models may be used for forecasting. We then considered the practical 
problems of identification, fitting, and diagnostic checking that arise when relating these 
models to actual data. In this chapter, we apply these methods to analyzing and forecasting 
seasonal time series. A key focus is on seasonal multiplicative time series models that 
account for time series dependence across seasons as well as between adjacent values 
in the series. These models are extensions of the ARIMA models discussed in earlier 
chapters. The methodology is illustrated using a time series commonly referred to as the 
airline data in the time series literature. We also describe an alternate structural component 
model approach to representing stochastic seasonal and trend behavior that includes the 
possibility of the components being deterministic. The chapter concludes with a brief 
discussion of regression models with autocorrelated errors. These models could include 
deterministic sine or cosine terms to describe the seasonal behavior of the series. 


9.1 PARSIMONIOUS MODELS FOR SEASONAL TIME SERIES 

Figure 9.1 shows monthly totals of international airline passengers for the 12-year period 
from January 1949 to December 1960. This series was discussed by Brown (1962) and is 
listed as Series G in Part Five of this book. The series is also included as series ‘ ‘ AirPassen- 
gers’ ’ in the R datasets package and is conveniently downloaded from there. The series 
shows a marked seasonal pattern since travel is at its highest in the late summer months, 
while a secondary peak occurs in the spring. Many other series, particularly sales data, 
show similar seasonal characteristics. 


Time Series Analysis: Forecasting and Control, Fifth Edition. George E. P. Box, Gwilym M. Jenkins, 
Gregory C. Reinsel, and Greta M. Ljung 

©2016 John Wiley & Sons. Inc. Published 2016 by John Wiley & Sons. Inc. 
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Time 

FIGURE 9.1 Totals of international airline passengers in thousands (Series G). 


In general, we say that a series exhibits periodic behavior with period s, when similarities 
in the series occur after s basic time intervals. In the above example, the basic time interval 
is 1 month and the period is s = 12 months. However, examples occur when s can take on 
other values. For example, s = 4 for quarterly data showing seasonal effects within years. 
It sometimes happens that there is more than one period. Thus, because bills tend to be paid 
monthly, we would expect weekly business done by a bank to show a periodicity of about 
4 within months, while monthly business shows a periodicity of 12. 

9.1.1 Fitting Versus Forecasting 

A common method of analyzing seasonal time series in the past was to decompose the 
series arbitrarily into three components: a trend, a seasonal component , and a random 
component. The trend might be fitted by a polynomial and the seasonal component by a 
Fourier series. A forecast was then made by projecting these fitted functions. However, 
such methods could give misleading results if applied indiscriminately. For example, we 
have seen that the behavior of IBM stock prices in Series B is closely approximated by the 
random walk model Vz ( = a t , that is, 


t -1 

z t = z o + Yi a ‘-j C 9 - 1 - 1 ) 

j =0 

This implies that z t (l) = z r ln other words, the best forecast of future values of the stock is 
very nearly today ’ s price. While it is true that short segments of Series B look as if they might 
be fitted by quadratic curves, this simply reflects the fact that a sum of random deviates 
can sometimes have this appearance. There is no basis for the use of a quadratic forecast 
function, which would produce very poor forecasts for this particular series. Similarly, 
while deterministic trend and seasonal components can provide a good fit to the data, they 
are often too rigid when it comes to forecasting. In this section, we introduce a seasonal 
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time series model that requires very few parameters and avoids the assumption of a trend 
and seasonal component that remains fixed over time. 

9.1.2 Seasonal Models Involving Adaptive Sines and Cosines 

The general linear model 

00 00 

= X + a t='Yj Vj a t-j + a t (9-1.2) 

j =i J =i 

with suitable values for the coefficients /r f and i// ; . can be used to describe many seasonal 
time series. The problem is to choose a suitable parsimonious parameterization for such 
models. We have seen that for nonseasonal series, it is usually possible to obtain a useful 
and parsimonious representation in the form 

cp(B)z, = 0(B)a, (9.1.3) 

Moreover, the generalized autoregressive operator cp(B) determines the eventual forecast 
function, which is the solution of the difference equation 

cp{B)z,{l) = 0 

where B is understood to operate on I. In representing seasonal behavior, we want the 
forecast function to trace out a periodic pattern. A first thought might be that cp(B) should 
produce a forecast function consisting of a mixture of sines and cosines, and possibly 
mixed with polynomial terms, to allow changes in the level of the series and changes in 
the seasonal pattern. Such a forecast function could arise naturally within the structure of 
the general model (9.1.3). For example, with monthly data, a forecast function that is a sine 
wave with a 12-month period, adaptive in phase and amplitude, will satisfy the difference 
equation 


(1 - VTB + B 2 )z,(l) = 0 

where B is understood to operate on /. However, periodic behavior may not be economically 
represented by mixtures of sines and cosines. Many sine-cosine components would, for 
example, be needed to represent sales data affected by Christmas, Easter, and other seasonal 
buying. To take an extreme case, sales of fireworks in Britain are largely confined to the 
weeks immediately before November 5, when the abortive attempt of Guy Fawkes to blow 
up the Houses of Parliament is celebrated. An attempt to represent the “single spike” of 
fireworks sales data directly by sines and cosines might be unprofitable. It is clear that a 
more careful consideration of the problem is needed. 

Now, in our previous analysis, we have not necessarily estimated all the components 
of cp(B). Where differencing d times was needed to induce stationarity, we have written 
cp(B) = <p(B)( 1 — B) d , which is equivalent to setting d roots of the equation (pi B) = 0 
equal to unity. When such a representation proved adequate, we could proceed with the 
simpler analysis of w, = \ /d z t . Thus, we have used V = 1 — B as a simplifying operator. In 
other problems, different types of simplifying operators might be appropriate. For example, 
the consumption of fuel oil for heat is highly dependent on ambient temperature, which, 
because the Earth rotates around the sun, is known to follow approximately a sine wave with 
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period of 12 months. In analyzing sales of fuel oil, it might then make sense to introduce 
1 - \f5 B + B 2 as a simplifying operator, constituting one of the contributing components 
of the generalized autoregressive operator cp(B). If such a representation proved useful, we 
could then proceed with the simpler analysis of w, = (1 — \/5B + B 2 )z t . This operator is 
of the homogeneous nonstationary variety, having zeros e ±(' 2 V 12 ) on the unit circle. 


9.1.3 General Multiplicative Seasonal Model 

Simplifying Operator 1—B s . The fundamental fact about seasonal time series with period 
s is that observations that are s intervals apart are similar. Therefore, one can expect that 
the operation B s z, = z,_ s will play a particularly important role in the analysis of seasonal 
series. Furthermore, since nonstationarity is to be expected in the series z f , z t _ s , z t _ 2s ,..., 
the simplifying operation 


V,z r = (1 -B s )z, = z,-z t _ s 

should be useful. This nonstationary operator 1 — B s has s zeros e li2,ck / s ' l (k = 0,1,..., s — 
1) evenly spaced on the unit circle. Moreover, the eventual forecast function satisfies 
(1 — B s )z t (l) = 0 and so may (but need not) be represented by a full complement of sines 
and cosines: 


2,(0 = b\ 


(0 


[s/2] 
7 = 1 


b,. cos 


/27zy/ 


+ b (2> sin 
2 / 




where the b’s are adaptive coefficients, and where [s/2] = if s is even and [s/2] = 
i(s — 1) if s is odd. 


Multiplicative Model. When a series exhibits seasonal behavior with known periodicity s, 
it is useful to display the data in the form of a table containing s columns, such as Table 9.1, 
which shows the logarithms of the airline data. For seasonal data, special care is needed in 
selecting an appropriate transformation. In this example, data analysis supports the use of 
the logarithm (see Section 9.3.5). 

The arrangement of Table 9.1 emphasizes the fact that, in periodic data, there are not one 
but two time intervals of importance. For this example, these intervals correspond to months 
and years. Specifically, we expect relationships to occur (a) between the observations for 
successive months in a particular year and (b) between the observations for the same month 
in successive years. The situation is somewhat like that in a two-way analysis of variance 
model, where similarities can be expected between observations in the same column and 
between observations in the same row. 

For the airline data, the seasonal effect implies that an observation for a particular 
month, say April, is related to the observations for previous Aprils. Suppose that the f-th 
observation z t is for the month of April. We might be able to link this observation z, to 
observations in previous Aprils by a model of the form 


<FCB s )Vf z, = &(B s )a, 


(9.1.4) 



TABLE 9.1 Natural Logarithms of Monthly Passenger Totals (Measured in Thousands) in International Air Travel (Series G) 



Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1949 

4.718 

4.771 

4.883 

4.860 

4.796 

4.905 

4.997 

4.997 

4.913 

4.779 

4.644 

4.771 

1950 

4.745 

4.836 

4.949 

4.905 

4.828 

5.004 

5.136 

5.136 

5.063 

4.890 

4.736 

4.942 

1951 

4.977 

5.011 

5.182 

5.094 

5.147 

5.182 

5.293 

5.293 

5.215 

5.088 

4.984 

5.112 

1952 

5.142 

5.193 

5.263 

5.199 

5.209 

5.384 

5.438 

5.489 

5.342 

5.252 

5.147 

5.268 

1953 

5.278 

5.278 

5.464 

5.460 

5.434 

5.493 

5.576 

5.606 

5.468 

5.352 

5.193 

5.303 

1954 

5.318 

5.236 

5.460 

5.245 

5.455 

5.576 

5.710 

5.680 

5.557 

5.434 

5.313 

5.434 

1955 

5.489 

5.451 

5.587 

5.595 

5.598 

5.753 

5.897 

5.849 

5.743 

5.613 

5 648 

5.628 

1956 

5.649 

5.624 

5.759 

5.746 

5.762 

5.924 

6.023 

6.004 

5.872 

5.724 

5.602 

5.724 

1957 

5.753 

5.707 

5.875 

5.852 

5.872 

6.045 

6.142 

6.146 

6.001 

5.849 

5.720 

5.817 

1958 

5.829 

5.762 

5.892 

5.852 

5.894 

6.075 

6.196 

6.225 

6.001 

5.883 

5.737 

5.820 

1959 

5.886 

5.835 

6.006 

5.981 

6.040 

6.157 

6.306 

6.326 

6.138 

6.009 

5.892 

6.004 

1960 

6.033 

5.969 

6.038 

6.133 

6.157 

6.282 

6.433 

6.407 

6.230 

6.133 

5.966 

6.068 
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where s = 12, V s = 1 — B s , and Of B s ), ©f B s ) are polynomials in B ' of degrees P and Q, 
respectively, and satisfying stationarity and invertibility conditions. Similarly, a model 

O (£') vf z,_!= ©(*>,_! (9.1.5) 

might be used to link the current behavior for March with previous March observations, 
and so on, for each of the 12 months. Moreover, it is usually reasonable to assume that the 
parameters O and 0 contained in these monthly models would be approximately the same 
for each month. 

Now the error components, a t , a t _],.... in these models would not in general be uncor¬ 
related. For example, the total of airline passengers in April 1960, while related to previous 
April totals, would also be related to totals in March 1960, February 1960, January 1960, 
and so on. Thus, we would expect that a t in (9.1.4) would be related to in (9.1.5) and to 
a t _ 2 , and so on. Therefore, to account for such relationships, we introduce a second model 

4>(B)V d a t = e(B)a t (9.1.6) 

where now a, is a white noise process and </>( B) and 0(B ) are polynomials in B of degrees p 
and q, respectively, and satisfying stationarity and invertibility conditions, and V = V j = 
1 -B. 

Substituting (9.1.6) in (9.1.4), we obtain a general multiplicative model 


cfipiBWpiB^Vfz, = 6 q (B)® Q (B s )a t (9.1.7) 

where, for this particular example, s = 12. Also, the subscripts p, P, q, and Q have been 
added to indicate the orders of the various operators. The resulting multiplicative process 
will be said to be of order ( p , d, q ) X (P, D, Q) s . A similar argument can be used to obtain 
models with three or more periodic components to take care of multiple seasonalities. 

In the next two sections, we examine some basic forms of the seasonal model introduced 
above and demonstrate their potential for forecasting. We also consider the problems of 
identification, estimation, and diagnostic checking that arise in relating such models to data. 
No new principles are needed to do this, merely an application of the procedures and ideas 
already discussed in Chapters 6-8. This is illustrated in the next section where a seasonal 
ARIMA model of order (0,1,1) X (0,1,1) 12 is used to represent the airline data. 


9.2 REPRESENTATION OF THE AIRLINE DATA BY A MULTIPLICATIVE 
(0,1,1) X (0,1,1) 12 MODEL 

9.2.1 Multiplicative (0,1,1) X (0,1,1) 12 Model 

We have seen that a simple and widely applicable stochastic model for the analysis of 
nonstationary time series, which contains no seasonal component, is the IMA(0, 1, 1) 
process. Suppose, following the argument presented above, that we have a seasonal time 
series and employ the model 


V 12 z r = (1 -®B n )a, 



REPRESENTATION OF THE AIRLINE DATA BY A MULTIPLICATIVE (0, 1,1) X (0, 1,1) 12 MODEL 311 


for linking z’s 1-year apart. Suppose further that we employ a similar model 

Va f = (1 — 9B)a t 

for linking a’s 1-month apart, where in general 9 and 0 will have different values. Then, 
on combining these expressions, we obtain the seasonal multiplicative model 

VV 12 z r = (1 -9B){\ -QB n )a, (9.2.1) 

of order (0,1,1) X (0,1,1) 12 . The model written explicitly is 

z, — z t _i — T = (if — 9cif_] — + 9®ctj_^ (9.2.2) 

The invertibility region for this model, required by the condition that the roots of (1 — 
9B)( 1 — 0 B 12 ) = 0 lie outside the unit circle, is defined by the inequalities — 1 < 9 < 1 
and — 1 < 0 < 1. Note that the moving average operator (1 — 9B)( 1 — QB 12 ) = 1 — OB — 
@B 12 + 9&B l \ on the right-hand side of (9.2.1), is of order q + sQ = 1 + 12(1) = 13. 

We will show below that the logged airline data are well represented by a model of 
this form, where to a sufficient approximation, 9 = 0.4, 0 = 0.6, and a 1 = 1.34 X 10 -3 . 
However, as a preliminary, we first consider how this model and with these parameter 
values inserted can be used to forecast future values of the series. 


9.2.2 Forecasting 

In Chapter 4, we saw that there are three basically different ways of considering the 
general model, each giving rise to a different way of viewing the forecast in Chapter 5. We 
consider now these three approaches for the forecasting of the seasonal model introduced 
above. 

Difference Equation Approach. Forecasts are best computed directly from the difference 
equation itself. Thus, since 


z t+l - z t+l -1 + z t+l -12 - Z r+/-13 + a t+l - @ a t+l -1 - ® a /+/-12 + 9® a t+l -13 (9.2.3) 

after setting 9 = 0.4, 0 = 0.6, the minimum mean square error forecast at lead time l and 
origin t is given immediately by 

z ,0) = \ z t+l -1 + z r+/-12 - ~r+/—13 + a t+l ~ 0- 4a r+/-l - 0.6fl, +/ _ 12 + 0.24fl r+; _i 3 ] 

(9.2.4) 

where 


[Zt+li = E[z t+ ,\z v z r _ h ■■■; 9, 0] 

is the conditional expectation of z r+/ taken at origin t. In this expression, the parameters are 
assumed to be known, and knowledge of the series z t , z t _ j, ... is assumed to extend into 
the remote past. 

Practical application depends upon the following facts: 


1. Invertible models fitted to actual data usually yield forecasts that depend appreciably 
only on recent values of the series. 
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2. The forecasts are insensitive to small changes in parameter values such as are intro¬ 
duced by estimation errors. 

Now 


[z, +J ] 


z t+j j ^ 0 

Z,(j) j > o 


(9.2.5) 


[« (+i ] = 


a t +j J ^ 0 
0 j >0 


(9.2.6) 


Thus, to obtain the forecasts, we simply replace unknown z’s by forecasts and unknown 
a’s by zeros. The known a's are, of course, the one-step-ahead forecast errors already 
computed, that is, a t = z t — z r _[(l). 

For example, to obtain the 3-months-ahead forecast, we have 

~r+3 = z l +2 + z t -9 _ ~r-10 + a t +3 _ 0.4a f+2 - 0.6a,_ 9 + 0.24 a t _ w 

Taking conditional expectations at the origin t gives 

z t ( 3) = z f (2) + z,_ 9 - z t _ 10 - 0.6 o,_ 9 + 0.24a f _ 10 

Substituting a t _ 9 = z t _ 9 — z f _ 10 (l)and a t _ 10 = z r _ 10 — z r _u(l) on the right-hand side also 
yields 


2,(3) = z f (2) + 0.4z r _ 9 - 0.76z,_ 10 + 0.6z,_ 10 (l) - 0.24z t _ n (l) (9.2.7) 

which expresses the forecast in terms of previous z’s and previous forecasts of z’s. 

Figure 9.2 shows the forecasts for lead times up to 36 months, all made at the arbitrarily 
selected origin, July 1957. We see that the simple model, containing only two parameters, 
faithfully reproduces the seasonal pattern and supplies excellent forecasts. It is to be 
remembered, of course, that like all predictions obtained from the general linear stochastic 
model, the forecast function is adaptive. When changes occur in the seasonal pattern, these 
will be appropriately projected into the forecast. It will be noticed that when the 1-month- 
ahead forecast is too high, there is a tendency for all future forecasts from the point to 
be high. This is to be expected because, as has been noted in Appendix A5.1, forecast 
errors from the same origin, but for different lead times, are highly correlated. Of course, 
a forecast for a long lead time, such as 36 months, may necessarily contain a fairly large 
error. However, in practice, an initially remote forecast will be updated continually, and as 
the lead shortens, greater accuracy will be possible. 

The preceding forecasting procedure is robust to moderate changes in the parameter 
values. Thus, if we used 6 = 0.5 and 0 = 0.5, instead of 0 = 0.4 and © = 0.6, the forecasts 
would not be greatly affected. This is true even for forecasts made several steps ahead 
(e.g., 12 months). The approximate effect on the one-step-ahead forecasts of modifying 
the values of the parameters can be seen by studying the sum-of-squares surface. Thus, we 
know that the approximate confidence region for the k parameters (S is bounded, in general, 
by the contour S(fi) = .V(/3)[l + % 2 {k)/n\, which includes the true parameter point with 
probability 1 — e. Therefore, we know that, had the true parameter values been employed, 
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FIGURE 9.2 Airline data with forecasts for 1,2, 3, 36 months ahead, all made from an arbitrary 

selected origin, July 1957. 


with this same probability the mean square of the one-step-ahead forecast errors could not 
have been increased by a factor greater than 1 + x~(k)/n. 

Forecast Function, Its Updating, and the Forecast Error Variance. In practice, the 
difference equation procedure is by far the simplest and most convenient way for actually 
computing forecasts and updating them. However, the difference equation itself does not 
reveal very much about the nature of the forecasts and their updating. To cast light on these 
aspects, we now consider the forecasts from other points of view. 

Forecast Function. Using (5.1.12) yields z r+/ = z t (l) + e t (l), where 


e,(l) = a t+l + W\a t+ i-\ + - + (9.2.8) 

Now, the moving average operator on the right-hand side of (9.2.1) is of order 13. Hence, 
for / > 13, the forecasts satisfy the difference equation 

(1 -B)(l -B n )z,(l) = 0 / > 13 (9.2.9) 

where, in this equation, B operates on the lead time /. 

We now write / = (/•, m) = 12 r + m, r = 0,1,2,... and m = 1,2,..., 12, to represent a 
lead time of r years and m months, so that, for example, / = 15 = (1, 3). Then, the forecast 
function, which is the solution of (9.2.9), with starting conditions given by the first 13 
forecasts, is of the form 


z,(l) = z t (r , m) = bf m + rbf 1 > 0 (9.2.10) 

This forecast function contains 13 adjustable coefficients b ^ 2 ,..., &q' 12 , . These 

represent 12 monthly contributions and 1 yearly contribution and are determined by the 
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FIGURE 9.3 Seasonal forecast function generated by the model VV 12 z f = (1 - 9B)( 1 - &B s )a t , 
with 5 = 5. 


first 13 forecasts. The nature of this function is more clearly understood from Figure 9.3, 
which shows a forecast function of this kind, but with period s = 5, so that there are six 
adjustable coefficients b ^, b ^ 2 ,..., b^, b^\ 

Equivalently, since z t (l) satisfies (9.2.9) and the roots of (1 — B)( 1 — B 12 ) = 0 are 
1,1,-1,e±(' 2 * fc / 12 ) ; k = 1,...,5, on the unit circle, the forecast function, as in (5.3.3), can 
be represented as 


m = Y 

j =i 


, (r) 

b ,. cos 



+ b ^ sin 

2 ; 



+ bf 6 {- 1)' + bf + bf>l 


This shows that z t (l ) consists of a mixture of sinusoids at the seasonal frequencies 
2nj/\2,j = 1, ..., 6 , plus a linear trend with slope b * (r) . The coefficients b^j,by,b^\ 

and b in the expression above are all adaptive with regard to the forecast origin t, being 
determined by the first 13 forecasts. In comparison to (9.2.10), it is clear, for example, that 

f A A ji-fA 

= 12b j , and represents the annual rate of change in the forecasts z r (/), whereas b { 
is the monthly rate of change. 


The y/ Weights. To determine updating formulas and to obtain the variance of the forecast 
error e t (l) in (9.2.8), we need the yr weights in the form z, = ^%q the model. 

We can write the moving average operator in (9.2.1) in the form 

(1 - 0B)( 1 - QB n ) = (V + X. B)(V 12 + A B n ) 

where A = 1 — 9, A = 1 — 0, V 12 = 1 — B 12 . Hence, the model may be written as 

VV 12 z, = (V + AB)(W 12 + A B l2 )a t 
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By equating coefficients in VV 12 y/f-B) = (V + 2B)(V l2 + AB 12 ), it can be seen that the 

¥ weights satisfy y/ 0 = 1, y/ x - y/ 0 = A - 1, y/ u - ¥n ~ ¥o = A - 1, ¥n - ¥u - ¥\ + 

y/Q = (2 — 1)(A — 1), and y/j — ¥j-\ — Wj-n + Wj -13 = 0 otherwise. Thus, the yr weights 
for this process are 

¥\ = ¥2 = ••• = ¥n = 2 y/, 2 = 2 + A 

¥ 1 3 = ¥ 14 = ••• = ¥22 = M 1 + A) y / 24 = 2(1 + A) + A 

W25 = ¥26 = ■" = V^35 = ^(1 + 2A) yr 26 = 2(1 + 2 A) + A 

and so on. Writing y/j as y/ rm = yr l 2( . +m , where r = 0,1,2,... and m = 1,2,..., 12, refer, 
respectively, to years and months, we obtain 


where 


y/ rm = 2(1 + /-A) + <5A 


(9.2.11) 


{ 1 when m = 12 
0 when m # 12 


Updating. The general updating formula (5.2.5) is 


WO — + i) + Vi a t+i 


Thus, if m # s = 12, 

C )+ < I) = C + i + < +(A+ ^ A) ^i 

and on equating coefficients of r, the updating formulas are 


b 


,(f+i) 

0 ,m 
(t- 
I 


C + Aa 

0,m+l 


t +1 


b[ l+l) = C + 2A a t+] 


Alternatively, if m = s = 12, 

by+2 + r ^i ,+ 1) = Cl + + + (2 + A + /-2A)a r+1 

and in this case, 

C = Cl + C + ^ + ^)°r+l 

b { ' , ] +{) = C + 2Ao f+1 


(9.2.12) 


(9.2.13) 


In studying these relations, it should be remembered that bP ^ 1 will be the updated 

version of C m+ j ■ Thus, if the origin f was January of a particular year, C 2 would be the 
coefficient for March. After a month had elapsed, we should move the forecast origin to 
February and the updated version for the March coefficient would now be b ^* 1 *. 


Forecast Error Variance. Knowledge of the yr weights enables us to calculate the variance 
of the forecast errors at any lead time /, using the result (5.1.16), namely 


F(/) = (l+yr 1 2 + -+yr ; 2 _ 1 )<7 2 


(9.2.14) 
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Thus, setting A = 0.6,A = 0.4,er 2 = 1.34x 10 -3 in (9.2.11) and (9.2.14), the estimated 
standard deviations b(l) of the forecast errors of the log airline data are readily calculated 
for different lead times. 


Forecasts as a Weighted Average of Previous Observations. If we write the model in the 
form 

00 

z < = Yj n j z t-j + a < 

7=1 

the one-step-ahead forecast is 

00 

z tW = ^ K j z t+ 1 -j 
7=1 

The n weights may be obtained by equating coefficients in 

(1 - B)( 1 - B n ) = (1 - 9B)( 1 - 0£ 12 )(1 - k^B - k 2 B 2 - •••) 

Thus, 

jtj =0 j -\l-0) j = 1 , 2 ,..., 11 

7r 12 = 6> n (l -6>) + (l -0) 

= 6> 12 (1 - 6) - (1 - 6>)(1 - 0) (9.2.15) 

71 j — 9tTj_ 1 — &7Tj_ l2 + 9&7Tj_ l2l j > 14 

These weights are plotted in Figure 9.4 for the parameter values 9 = 0.4 and 0 = 0.6. 

The reason that the weight function takes the particular form shown in the figure may 
be understood as follows: the process (9.2.1) may be written as 


a i+i 



AB 

1 -9B 



1 - 


A B 12 \ 

1 -@B 12 ) 


z t +1 


(9.2.16) 



FIGURE 9.4 The n weights for (0,1,1) x (0,1,1) 12 process fitted to the airline data (9 = 0.4,0 = 
0 . 6 ). 
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We now use the notation EWMA^z^) to mean an exponentially weighted moving average, 
with parameter X = 1 — 9 of values z t , z,_i, z t _ 2 ,..., so that 

EWMA^z,) = —-— z t = X z t + X0z t _ x + X6 2 z t _ 2 + 

1 — 6B 

Similarly, we use EWMA A (z f ) to mean an exponentially weighted moving average, with 
parameter A = 1 — ©, of values z t , z,_ 12 , z,_ 2 4 , ■ ■ ■, so that 

EWMA a (z,) = i | -jjg z, = Az, + A0z,_ 12 + A0 2 z f _ 24 + 

Substituting z r (l) = z f+1 — a t+ lf in (9.2.16), we obtain 

z,( 1) = EWMA a (z ( ) + EWMA A (z t _], - EWMA a (z ( _ 12 )) (9.2.17) 

Thus, the forecast is an EWMA taken over previous months, modified by a second EWMA 
of discrepancies found between similar monthly EWMAs and actual performance in pre¬ 
vious years. As a particular case, if 6 = 0 {X = 1), (9.2.17) would reduce to 

z t (l) = z t + EWMA a (z,_h - z t _ n ) 

= Z, + A[(z,_n - 2 ,- 12 ) + ®( z t -23 - ~r-24> + -] 

which shows that first differences are forecast as the seasonal EWMA of first differences 
for similar months from previous years. 

For example, suppose that we were attempting to predict December sales for a depart¬ 
ment store. These sales would include a heavy component from Christmas buying. The first 
term on the right-hand side of (9.2.17) would be an EWMA taken over previous months up 
to November. However, we know this will be an underestimate, so we correct it by taking 
a second EWMA over previous years of the discrepancies between actual December sales 
and the corresponding monthly EWMAs taken over previous months in those years. 

The forecasts for lead times / > 1 can be generated from the n weights by substituting 
forecasts of shorter lead time for unknown values, as displayed in the general expression 
(5.3.6) of Section 5.3.3. Alternatively, explicit values for the weights applied directly to 
z t , z,_i, z,_ 2 ,... may be computed, for example, from (5.3.9) or from (A5.2.3). 

Calculation of Forecasts in R. Forecasts of future values of a time series that follows a 
multiplicative seasonal model can be calculated using R. A convenient option available in 
R is the command sarima.for() in the astsa package. For a series z, that follows a mul¬ 
tiplicative model with period s, the command is sarima.for(z,n.ahead,p,d,q,P,D,Q,s), 
where n.ahead is the lead time. Thus, to generate forecasts up to 24 steps ahead for the 
logged airline series using the model VV 12 z, = (1 — 0B)(\ — ®B l2 )a t , the commands are 

> library(astsa) 

> ap=ts(seriesG,start=c(1949,1),frequency=12) 

> log.AP=log(ap) 

> ml=sarima.for(log.AP,24,0,1,1,0,1,1,12) 

> ml % retrieves output from a file 

The output includes the forecasts (“pred”) and the prediction errors (“se”) of the forecasts. 
A graph of the forecasts with ±2 prediction error limits attached is provided as part of the 
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FIGURE 9.5 Forecasts along with ±2 prediction error limits for the logarithm of the airline data 
generated from the model VV 12 z, = (1 - 6B){ 1 - &B n )a t . 


output. Figure 9.5 shows the forecasts generated for the logged airline data using these 
commands. 


9.2.3 Model Identification 

The identification of the nonseasonal IMA(0, 1, 1) process depends upon the fact that, 
after taking first differences, the autocorrelations for all lags beyond the first are zero. For 
the multiplicative (0,1,1) X (0,1,1) 12 process (9.2.1), the only nonzero autocorrelations of 
VV 12 z r are those at lags 1,11, 12, and 13. In fact, from (9.2.2) the model is viewed as 


w, = a t — 9a t _ j — ®a t _ 12 + 0©a r _ 13 

which is an MA model of order 13 for w, = VV 12 z f . The autocovariances of w t are thus 
given by 

y 0 = [1 + 0 2 + @ 2 + (e®) 2 ]a 2 a = (1 + 6> 2 )(1 + ® 2 )a 2 a 

n = [-6 - ®( 0 ®)l<T 2 a = -e(l + @ 2 )a 2 

m = 0®e 2 a (9.2.18) 

7X2 = [-© - 0(9®)}°: = -Q(l + 02 )° 2 a 

713 = e ®a 2 a 


In particular, these expressions imply that 

-9 


P\ = 


1 +9 2 


and 


P 12 - 


-© 


1 +© 2 
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FIGURE 9.6 Estimated autocorrelation function of logged airline data: (a) undifferenced series, 
(b) first differenced series, (c) seasonally differenced series, and (d) series with regular and seasonal 
differencing. 


so that the value p ] is unaffected by the presence of the seasonal MA factor (1 — @£ 12 ) in 
the model (9.2.1), while the value of p 12 is unaffected by the nonseasonal or regular MA 
factor (1 - 6B). 

Figure 9.6 shows the estimated autocorrelations of the airline data for (a) the logged 
series, z t , (b) the logged series differenced with respect to months only, Vz r , (c) the 
logged series differenced with respect to years only, V 12 z,, and (d) the logged series 
differenced with respect to months and years, V V 12 z r . The autocorrelations for z, are large 
and fail to die out at higher lags. While simple differencing reduces the correlations in 
general, a very heavy periodic component remains. This is evidenced particularly by very 
large correlations at lags 12, 24, 36, and 48. Simple differencing with respect to period 12 
results in correlations which are first persistently positive and then persistently negative. 
By contrast, the differencing VV 12 markedly reduces correlations throughout. 

The autocorrelations of VV 12 z f exhibit spikes at lags 1 and 12, compatible with the 
theoretical autocovariances in (9.2.18) for model (9.2.1). As an alternative, however, the 
autocorrelations for V 12 z r might be viewed as dying out at a slow exponential rate beginning 
from lag one. Hence, there is also the possibility that V 12 z, may follow a nonseasonal 
ARMA(1, 1) model with </> relatively close to one, rather than a nonstationary IMA(0, 1,1) 
model as in (9.2.1). However, in practice, the distinction between these two models may not 
be substantial and the latter model will not be explored further here. The choice between the 
nonstationary and stationary AR( 1) factor could, in fact, be tested using unit root procedures 
similar to those described in Section 10.1 of the next chapter. 
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The autocorrelation functions shown in Figure 9.6 was generated in R using the following 
commands: 

> library(astsa) 

> log.AP=log(ts(seriesG)) 

> par(mfrow=c(2,2) ) 

> acf(log.AP,50,main='(a)') 

> acf(diff(log.AP),50,main='(b)') 

> acf(diff(log.AP,12),50,main='(c)') 

> acf(diff(diff(log.AP,12)),50,main='(d)') 


On the assumption that the model is of the form (9.2.1), the variances for the estimated 
higher lag autocorrelations are approximated by Bartlett’s formula (2.1.15), which in this 
case becomes 


var [r k \ 


1 + 2 (p\ + p 2 n + p\ 2 + P 13 ) 
n 


k > 13 


(9.2.19) 


Substituting estimated correlations for the p’s and setting n = 144 — 13 = 131 in (9.2.19), 
wheren = 131 is the number of differences VV 12 z f , we obtain a standard error <j(r) ~ 0.11. 
The dashed lines shown in Figure 9.6 are approximate two-standard-error limits computed 
under the assumption that there is no autocorrelation in the series so that var [r k ] = 1 /n. 


Preliminary Estimates. As with the nonseasonal model, by equating appropriate observed 
sample correlations to their expected values, approximate values can be obtained for the 
parameters 0 and 0. On substituting the sample estimates r 1 = —0.34 and r 12 = —0.39 in 
the expressions 


_ -o _ ~© 

Pl ~ 1 + e 2 Pn ~ 1 + © 2 

we obtain rough estimates 0 ~ 0.39 and © ~ 0.48. A table summarizing the behavior of the 
autocorrelation function for some specimen seasonal models, useful in identification and 
in obtaining preliminary estimates of the parameters, is given in Appendix A9.1. 


9.2.4 Parameter Estimation 

Contours of the sum-of-squares function S(0, ©) for the model (9.2.1) fitted to the airline 
data are shown in Figure 9.7, together with the appropriate 95% confidence region. The 
least-squares estimates (LE) are seen to be very nearly 6 = 0.4 and © = 0.6. The grid of 
values for S(6, ©) was computed using the technique described in Chapter 7. It was shown 
there that given n observations w from a linear process defined by 

4>(B)w t = 9(B)a t 

the quadratic form w / M„w, which appears in the exponent of the likelihood, can always 
be expressed in terms of a sum of squares of the conditional expectation of a’s 
and a quadratic function of the conditional expectation of the p+ q initial values 
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0 -♦ 

FIGURE 9.7 Contours of S(9,@) with shaded 95% confidence region for the model VV 12 z, = 
(1 - 9B)( 1 - &B n )a t fitted to the airline data. 


e* = (w x _ p , ...,w 0 , a x _ q ,a o y, that is, 

n n 

w'M„w = 5(0, 0) = J [a,] 2 = K] 2 + [ej'Q-'teJ 

r=-oo t= 1 

where [a t ] = [o,|w, <p, 0], [e^] = [ejw, <p, 0], and cov[eJ = (7 2 Q. Furthermore, S((p,G) 
plays a central role in the estimation of the parameters (p and 6 from both a sampling theory 
and a likelihood or Bayesian point of view. 

The computation for seasonal models follows precisely the course described in Section 
7.1.5 for nonseasonal models. The airline series has N = 144 observations. This reduces 
to «=131 observations after the differencing w t = VV 12 z r The [ a t ] in S(9, 0) can be 
calculated recursively using an approximate approach that iterates between the forward 
and backward versions of the (0,1,1) X (0,1,1) 12 model. Alternatively, an exact method 
discussed in Appendix A7.3 and also used in Section 7.1.5 can be employed. For the present 
model, this involves first computing the conditional estimates of the a t , using zero initial 
values = a® = = a® = 0, through a recursive calculation as 

= w, + 0cP t _ x + ©a°_ 12 - 0©o°_ 13 t = 1,... ,n (9.2.20) 

Then a backward recursion is used to obtain a series a t as 

u t = cP + 9u t+ j + ©«, + i 2 — 9@u t+13 t = n ,..., 1 

using zero initial values « )J+1 = ••• = u„ +13 = 0. Finally, the exact estimate for the vector 
of initial values = (a_ 12 ,... ,a 0 ) is obtained by solving the equations D[a*] = F r u, 
as described in (A7.3.12) of Appendix A7.3. Letting h = F'u = (/z_ 12 . h_ u , .... /i 0 )', the 
values h_j are computed as 

h_j = ~(0u_ J+1 + ®u _ j+ 12 - 0@u_ j+ 13 ) 
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with u_j = 0, j > 0. Once the initial values are estimated, the remaining [a,] values for 
t = 1,2,.... n are calculated recursively as in (9.2.20), and hence the exact sum of squares 
S(8 ,0) = Xrfint 0 '] 2 is obtained. 

Iterative Calculation of Least-Squares Estimates 0 , 0. While it is essential to plot 
sums-of-squares surfaces in a new situation, or whenever difficulties arise, an iterative lin¬ 
earization technique may be used in straightforward situations to supply the least-squares 
estimates and their approximate standard errors. The procedure has been set out in Section 
7.2.1, and no new difficulties arise in estimating the parameters of seasonal models. 

For the present example, we can write approximately 

a t,0 = (0 - @o) x t,\ + (® _ ©oK,2 + a t 


where 


l ?,l 


da, 

"dd 


0Q.0Q 


K t,2 


da, 

'd© 


e n ,& n 


and where 0 Q and © 0 are guessed values and a, 0 = [a,\9 0 , © 0 ]. As explained and illustrated 
in Section 7.2.2, the derivatives are most easily computed numerically. Alternatively, the 
derivatives could be obtained to any degree of accuracy by recursive calculation. 

Proceeding this way and using as starting values, the preliminary estimates 8 = 0.39, © = 
0.48 obtained above, parameter estimates correct to two decimals are available in three 
iterations. The estimated variance of the residuals is a 1 = 1.34 X 10 -3 . From the inverse 

a 

of the matrix of sums of squares and products of the x's on the last iteration, the standard 
errors of the estimates may now be calculated. The least-squares estimates followed by 
their standard errors are then 


0 = 0.40 ± 0.08 
0 = 0.61 ±0.07 

agreeing closely with the values obtained from the sum-of-squares plot. 

Large-Sample Variances and Covariances for the Estimates. As in Section 7.2.6, large- 
sample formulas for the variances and covariances of the parameter estimates may be 
obtained. In this case, from the model equation w, = a, — 8a, — &a,_ 12 + 6>©a r _ 13 , the 
derivatives x, j = —da,/dd are seen to satisfy 

x t,l ~ ® x t- 1,1 - ® x f-12,l + ^® x r-i3.t + a t-\ ~ ® a f—13 = 0 

hence (1 — 8B)( 1 — OB 12 )x, [ = —(1 — ©B 12 )^.^ or simply (1 — 8B)x, [ = — a,_ l . Thus, 
using a similar derivation for x, 2 = —da,/d&. we obtain that 

OO 

x t,l - “(I - 9B)~ 1 a,_ i = - ^ 8 J B J a,_ { 

7=0 

OO 

a ,,2 ^ -(1 - © B 12 )- V 12 = - Yj 0 '^ I 2 , « I -12 

i=0 
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Therefore, for large samples, the information matrix is 

[ (l-6» 2 )-' 0 n (l-0 12 ©)- 1 ' 

1(0,0) = n e u^_ e n Q y\ ( 1 _©2)-t 

Provided that |0| is not close to unity, the off-diagonal term is negligible, and approximate 
values for the variances and covariances of 0 and 0 are 

V (0) ~ n~\ 1 - 0 2 ) V (0) ~ n ~ l (1 - © 2 ) 

cov[0, ©] ~ 0 (9.2.21) 

In the present example, substituting the values 0 = 0.40,0 = 0.61, and n = 131, we obtain 

V(0) ~ 0.0064 V (©) ~ 0.0048 

and 

(7(0) ~ 0.08 (7(0) ~ 0.07 

which, to this accuracy, are identical with the values obtained directly from the iteration. 
It is also interesting to note that the parameter estimates 0 and 0, associated with months 
and years, respectively, are virtually uncorrelated. 

Parameter Estimation in R. The parameters of the model 

VVi 2 z i = w t = (1 — 0_B)(1 - ®B ll )a t 

can be estimated in R using the command sarima(log.AP,p,d,q,P,D,Q,S=12) in the 
astsa package as demonstrated below. The resulting estimates of the two parameters 0 
and © are 0.40 and 0.56, respectively, with corresponding standard errors of 0.09 and 0.07. 
The full likelihood function, including the determinant, is used for parameter estimation, 
which accounts for the difference between the parameter estimates derived above and those 
obtained in R. Also, in viewing the output, it should be noted that R defines the moving 
average operators with positive signs, in contrast to the negative signs used in this text. 

> library(astsa) 

> log.AP=log(ts(seriesG)) 

> ml.AP=sarima(log.AP, 0,1,1,0,1,1,S=12) 

> ml.AP % Retrieves output from file 

OUTPUT: 

Call: 

stats:arima(x=xdata,order=c(p,d,q),seasonal= list(order=c(P,D,Q), 
period=S),optim.control=list(trace=trc,REPORT=l,reltol=tol)) 

Coefficients : 

mal smal 

-0.4018 -0.5569 

S.e. 0.0896 0.0731 

sigma~2 estimated as 0.001348: log likelihood=244.7, aic=-483.4 
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9.2.5 Diagnostic Checking 

Before proceeding further, we check the adequacy of fit of the model by examining the 
residuals from the fitted model. 

Autocorrelation Checks. The standardized residuals calculated from the fitted model and 
the estimated autocorrelations of the residuals are shown in Figure 9.8. The figure is 
generated as part of the output from the estimation command “sarima” in R. The residual 
autocorrelations do not present evidence of any lack of fit, since none of the values fall 
outside the approximate two-standard-error limits of 0.18. This conclusion is also supported 
by the p values of the portmanteau statistics Q = n(n + 2) r 2 (a)/(n — k) which are 

shown for different values of K in the last part of the graph. 

Periodogram Check. The cumulative periodogram (see Section 8.2.5) for the residuals is 
shown in Figure 9.9. The Kolmogorov-Smirnov 5 and 25% probability limits, which as 
we have seen in Section 8.2.5 supply a very rough guide to the significance of apparent 
deviations, fail in this instance to indicate any significant departure from the assumed 
model. 


Standardized residuals 



ACF of residuals 


Normal Q-Q plot of Std residuals 



E 

03 

c n 



p values for Ljung-Box statistic 


Theoretical quantiles 



FIGURE 9.8 Diagnostic checks on the residuals from the fitted model. 
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FIGURE 9.9 Cumulative periodogram check on residuals from the model VV 12 z ! — (1 
0.40i?)(l - 0.61 B' 2 )a t , fitted to the airline data. 

9.3 SOME ASPECTS OF MORE GENERAL SEASONAL ARIMA MODELS 
9.3.1 Multiplicative and Nonmultiplicative Models 

In previous sections, we discussed methods of dealing with seasonal time series, and in 
particular, we examined an example of a multiplicative model. We have seen how this 
model can provide a useful representation with remarkably few parameters. It now remains 
to study other seasonal models of this kind, and insofar as new considerations arise, the 
associated processes of identification, estimation, diagnostic checking, and forecasting. 

Suppose, in general, that we have a seasonal effect associated with period s. Then, the 
general class of multiplicative models may be typified in the manner shown in Figure 9.10. 
In the multiplicative model, it is assumed that the “between periods” development of the 
series is represented by some model 


z rm = & Q (B s )a rm 

while “within periods” the a’s are related by 

</> p (B)S7 d a rm = 9 q (B)a rm 

Obviously, we could change the order in which we considered the two types of models and 
in either case obtain the general multiplicative model 

4> p (B)0 P (B s )V d Vfz^ m = e q (B)® Q (B s )a rm (9.3.1) 

where a r m is a white noise process with zero mean. In practice, the usefulness of models such 
as (9.3.1) depends on how far it is possible to parameterize actual time series parsimoniously 
in these terms. In fact, experience has shown that this is possible for a variety of seasonal 
time series coming from widely different sources. While the multiplicative model (9.2.1) 
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FIGURE 9.10 Two-way table for multiplicative seasonal model. 


has been found to fit many time series, other models of the form (9.3.1) have also been 
found to be useful in practise. 

It is not possible to obtain a completely adequate fit with multiplicative models for all se¬ 
ries. One modification that is sometimes useful allows the mixed moving average operator 
to be nonmultiplicative. By this is meant that we replace the operator 9 q ( B)Qq( B ' ) on the 
right-hand side of (9.3.1) by a more general moving average operator Alternatively, 

or in addition, it may be necessary to replace the autoregressive operator <fi p (B)Q> P {B S ) 
on the left by a more general autoregressive operator (p* t (B). Some examples of nonmul¬ 
tiplicative models are given in Appendix A9.1. These are numbered 4, 4a, 5, and 5a. 

In those cases where a nonmultiplicative model is found necessary, experience suggests 
that the best-fitting multiplicative model can provide a good starting point from which to 
construct a better nonmultiplicative model. The situation is reminiscent of the problems 
encountered in analyzing two-way analysis of variance tables, where additivity of row and 
column constants may or may not be an adequate assumption, but may provide a good 
point of departure. 

Our general strategy for relating multiplicative or nonmultiplicative models to data is 
that which we have already discussed and illustrated in some detail in Section 9.2. Using 
the autocorrelation function for guidance: 

1. The series is differenced with respect to V and/or V^, so as to produce stationarity. 

2. By inspection of the autocorrelation function of the suitably differenced series, a 
tentative model is selected. 

3. From the values of appropriate autocorrelations of the differenced series, preliminary 
estimates of the parameters are obtained. These can be used as starting values in the 
search for the least-squares or maximum likelihood estimates. 

4. After fitting, the diagnostic checking process applied to the residuals either may lead 
to the acceptance of the tentative model or, alternatively, may suggest ways in which 
it can be improved, leading to refitting and repetition of the diagnostic checks. 

As a few practical guidelines for model specification, we note that for seasonal series 
the order of seasonal differencing D needed would almost never be greater than one, and 
especially for monthly series with s = 12, the orders P and Q of the seasonal AR and MA 
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operators Of B s ) and ®(B S ) would rarely need to be greater than 1. This is particularly so 
when the series length of available data is not sufficient to warrant a more complicated 
form of model with P > 1 or Q > 1. 


9.3.2 Model Identification 

A useful aid in model identification is the list in Appendix A9.1 that gives the autocovariance 
structure of w t = V d V^z f for a number of simple seasonal models. This list makes no claim 
to be comprehensive. However, it does include some frequently encountered models, and 
the reader should have no difficulty in discovering the characteristics of others that may 
seem useful. It should be emphasized that rather simple models, such as models 1 and 2 in 
the appendix, have provided adequate representations for many seasonal series. 

Since the multiplicative seasonal ARMA models for the differences w t = VV s z, may 
be viewed as special forms of ARMA models with orders p + sP and q + sQ , their auto¬ 
covariances can be derived from the principles of Chapter 3, as was done in the previous 
section for the MA model w t = a, — 9a,_\ — ©u r _p + 0®a,_ l3 . For further illustration, 
consider the model 


(1 - (f>B)w t = (1 - @B s )a, 

which is a special form of ARMA model with AR order I and MA order s. First, since 
the ip weights for this model for w t satisfy ipj — 0i//y_| = 0, j = 1,... ,s — 1, we have 
i //j = <pi, j = 1,..., s — 1, as well as y/ s = 0 s — © and y/j = 0y/ y _ 1 ,j > •?. It is then easy to 
see that the autocovariances for w t will satisfy 

ft) = 0ft +^( ! -@Vft) 

Yj = 0ft -1 - j = l,..., s (9.3.2) 

ft = 0ft_ i j > s 

Solving the first two equations for y (l and y l , we obtain 

2 1 - ©( 0 s - ©) - 0 s © 2 1 + © 2 - 20 s © 

/Q ~ ° a 1 - 0 2 “ <7a 1 - 0 2 

2 0[ 1 - ©(0 s - ©)] - 0 s ” 1 © o 0(1 + ©2 - 0 s ©) - 0 s - 1 © 

Vi = a -= a - 

n 1-02 * 1-02 

with Yj = 0ft_ 1 - o-2©0'-J = 0> ro - (j2@0i-7(i _ 02/)/(l - 0 2 ), j = 1,... ,S and fj = 
<t>Yj-\ = 4>j- s Ys’j > s - Hence, in particular, for monthly data with s = 12 and |0| not 
too close to one, the autocorrelation function pj for this process will behave, for low lags, 
similarly to that of a regular AR(1) process, pj ~ <p-i for small j, while the value of p n will 
be close to —©/(l + 0 2 ). 

A fact of considerable utility in deriving autocovariances of a multiplicative process is 
that for such a process, the autocovariance generating function (3.1.11) is the product or 
the generating functions of the components. Thus, in (9.3.1) if the component models for 
V d z t and V^a,, 


$ p (B)S7 d z, = 0 q (B)a, 


0 P (£ s )Vfa f = ® Q (B)a t 
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have autocovariance generating function y(B) and F (B s ), the autocovariance generating 
function for w, = V d Vfz, in (9.3.1) is 


r(BW(B s ) 

Another point to be remembered is that it may be useful to parameterize more general 
models in terms of their departures from related multiplicative forms in a manner now 
illustrated. 

The three-parameter nonmultiplicative operator 

1 - 0 { B - 0 U B 12 - 0 13 B 13 (9.3.3) 

employed in models 4 and 5 in the appendix may be written as 

(1 -0 X B){\ - 6 n B 12 ) - kB 13 


where 


k — 9 l 6 u ~ (—013) 

An estimate of k that was large compared with its standard error would indicate the need for 
a nonmultiplicative model in which the value of 0 13 is not tied to the values of 0\ and 0 l2 - 
On the other hand, if k is small, then on writing 01 - 0> 012 = ©, the model approximates 
the multiplicative (0, 1, 1) X (0,1,1) 1? model. 


9.3.3 Parameter Estimation 

No new problems arise in the estimation of the parameters of general seasonal models. 
The unconditional sum of squares is computed quite generally by the methods set out 
fully in Section 7.1.5 and illustrated further in Section 9.2.4. As always, contour plotting 
can illuminate difficult situations. In well-behaved situations, iterative least-squares with 
numerical determination of derivatives yield rapid convergence to the least-squares esti¬ 
mates, together with approximate variances and covariances of the estimates. Recursive 
procedures can be derived in each case, which allow direct calculation of derivatives, if 
desired. 

Large-Sample Variances and Covariances of the Estimates. The large-sample informa¬ 
tion matrix 1(0, 6 . <I>, 0) is given by evaluating E[X r X], where, as in Section 7.2.6, X is 
the nX(p+q + P + Q) matrix of derivatives with reversed signs. Thus, for the general 
multiplicative model 


a, = e~\B)&- l (B s )(l)(B)(V(B s )w t 
where w t = V d Vfz t , the required derivatives are 


da, , 

— = 0~ 1 (B)B a, 
d0 t ' 

da, , 

XT = -f-\B)BJa t 
dfj 


da, , 

-^ = @-\B s )B sl a t 

o 0 , 

da t t 

—- = -<D -\B s )B SJ a t 
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Approximate variances and covariances of the estimates are obtained as before, by inverting 
the matrix 1(0, 0 , <I>, 0). 

9.3.4 Eventual Forecast Functions for Various Seasonal Models 

We now consider the characteristics of the eventual forecast functions for a number of 
seasonal models. For a seasonal model with single periodicity s, the eventual forecast 
function at origin t for lead time / is the solution of the difference equation 

0(£)(h(£' s )V d Vf z,(l) = 0 

Table 9.2 shows this solution for various choices of the difference equation; also shown is 
the number of initial values on which the behavior of the forecast function depends. 

In Figure 9.11, the behavior of each forecast function is illustrated for s = 4. It will 
be convenient to regard the lead time I = rs + m as referring to a forecast r years and m 
quarters ahead. In the diagram, an appropriate number of initial values (required to start the 
forecast off and indicated by bold dots) has been set arbitrarily and the course of the forecast 


(1) 

( 2 ) 

(3) 

(4) 

(5) 


( 6 ) 

(7) 


Autoregressive 

operator 


1 —0.5 B’ 

P 

1 -B' 

(1 —S) (1 —0.5 B') 






FIGURE 9.11 Behavior of the seasonal forecast function for various choices of the general seasonal 
autoregressive operator. 
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TABLE 9.2 Eventual Forecast Functions for Various Generalized Autoregressive Operators 


Generalized 

Autoregressive 

Operator 

Eventual Forecast Function 
z(r, m) a 

Number of Initial 

Values on which 
Forecast Function Depends 

(1) 1 - 

+ (■Km ~ 7')® r 

s 

(2) 1 - B s 

K,m 

Cl— <3> r i 

s 

(3)(1- B){1-®B S ) 

*o + (V-*o)® r + *i{yz^} 

s + 1 

(4) (1 - B)(l - B') 

Km + V 

, r i - i 

5 + 1 

(5) (1 - 4>B)( 1 - B') 

Km + Km r + ^b 2 r(r - 1) 

5 + 1 

(6) (1 - B)( 1 - B’f 

2 5+ 1 

(7) (1 - B)\ 1 - B s ) 

Km + [*l + ( m ~ l)K1 r + \ b 2 Sr ( r ~ !) 

5 + 2 


“Coefficients b are all adaptive and depend upon forecast origin 1. 


function traced to the end of the fourth period. When the difference equation involves an 
autoregressive parameter, its value has been set equal to 0.5. 

The constants b 0 m , b l , and so on, appearing in the solutions in Table 9.2, should strictly 
be indicated by , b^\ and so on, since each one depends on the origin t of the forecast, 
and these constants are adaptively modified each time the origin changes. The superscript 
r has been omitted temporarily to simplify notation. 

The operator labeled (1) in Table 9.2 is stationary, with the model containing a fixed 
mean \i. It is autoregressive in the seasonal pattern, and the forecast function decays with 
each period, approaching closer and closer to the mean. 

Operator (2) in Table 9.2 is nonstationary in the seasonal component. The forecasts for 
a particular quarter are linked from year to year by a polynomial of degree 0. Thus, the 
basic forecast of the seasonal component is exactly reproduced in forecasts of future years. 

Operator (3) in Table 9.2 is nonstationary with respect to the basic time interval but 
stationary in the seasonal component. Operator (3) in Figure 9.11 shows the general level 
of the forecast approaching asymptotically the new level 


where, at the same time, the superimposed predictable component of the stationary seasonal 
effect dies out exponentially. 

In Table 9.2, operator (4) is the limiting case of the operator (3) as <I> approaches unity. 
The operator is nonstationary with respect to both the basic time interval and the periodic 
component. The basic initial forecast pattern is reproduced, as is the incremental yearly 
increase. This is the type of forecast function given by the multiplicative (0,1,1)X (0,1,1) 12 
process fitted to the airline data. 

Operator (5) is nonstationary in the seasonal pattern but stationary with respect to the 
basic time interval. The pattern approaches exponentially an asymptotic basic pattern 

brf 1 "- 1 

z r (oo, m) = b 0m + 


1-0 
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Operator (6) is nonstationary in both the basic time interval and the seasonal component. 
An overall quadratic trend occurs over years, and a particular kind of modification occurs 
in the seasonal pattern. Individual quarters not only have their own level b Q m but also 
their own rate of change of level b\ m . Therefore, when this kind of forecast function is 
appropriate, we can have a situation where, for example, as the lead time is increased, the 
difference in summer over spring sales can be forecast to increase from one year to the 
next, while at the same time, the difference in autumn over summer sales can be forecast 
to decrease. 

In Table 9.2, operator (7) is again nonstationary in both the basic time interval and in the 
seasonal component, and there is again a quadratic tendency over years with the incremental 
changes in the forecasts from one quarter to the next changing linearly. However, in this 
case, they are restricted to have a common rate of change. 

9.3.5 Choice of Transformation 

It is particularly true for seasonal models that the weighted averages of previous data 
values, which comprise the forecasts, may extend far back into the series. Care is therefore 
needed in choosing a transformation in terms or which a parsimonious linear model will 
closely apply over a sufficient stretch of the series. Simple graphical analysis can often 
suggest such a transformation. Thus, an appropriate transformation may be suggested by 
determining in what metric the amplitude of the seasonal component is roughly independent 
of the level of the series. To illustrate how a data-based transformation may be chosen more 
exactly, denote the untransformed airline data by x, and let us assume that some power 
transformation [z = x x for A f 0, z = ln(x) for A = 0] may be needed to make the model 
(9.2.1) appropriate. Then, as suggested in Section 4.1.3, the approach of Box and Cox 
(1964) may be followed, and the maximum likelihood value obtained by fitting the model 
to x^' = (x A — 1 )/Xx A ~ l for various values of A, and choosing the value of A that results 
in the smallest residual sum of squares s^. In this expression, x is the geometric mean of 
the series x, and it is easily shown that x (0) = x ln(x). For the airline data, we find 


A 

s, 

A 


4 

S x 


13,825.5 

-0.1 

11,627.2 

K| 

11,784.3 


12,794.6 

0.0 

11,458.1 


12,180.0 


12.046.0 

0.1 

11,554.3 

Hal 

12,633.2 


The maximum likelihood value is thus close to 2 = 0, confirming the appropriateness of 
the logarithmic transformation for the airline series. 


9.4 STRUCTURAL COMPONENT MODELS AND DETERMINISTIC 
SEASONAL COMPONENTS 

A traditional method to represent a seasonal time series has been to decompose the series 
into trend, seasonal, and noise components, as z t = T t + S, + N t , where the trend T t and 
seasonal component S t are represented as deterministic functions of time using polynomial 
and sinusoidal functions, respectively. However, as noted in Section 9.1.1, the deterministic 
nature of the trend and seasonal components limits the applicability of these models. 
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Subsequently, models that permit random variation in the trend and seasonal components, 
referred to as structural component models, have become increasingly popular for time 
series modeling (e.g., Harvey, 1989; Harvey and Todd, 1983; Gersch and Kitagawa, 1983; 
Kitagawa and Gersch, 1984; Hillmer and Tiao, 1982; and Durbin and Koopman, 2012). 
We discuss these models briefly in the following sections. 

9.4.1 Structural Component Time Series Models 

In general, a univariate structural component time series model is one in which an observed 
series z t is formulated as the sum of unobservable component or "signal" time series. 
Although the components are unobservable and cannot be uniquely specified, they will 
usually have direct meaningful interpretation, such as representing the seasonal behavior 
or the long-term trend of an economic time series or a physical signal that is corrupted 
by measurement noise in the engineering setting. Thus, the models attempt to describe 
the main features of the series as well as provide a basis for forecasting, signal extrac¬ 
tion, seasonal adjustments, and other applications. For a monthly time series, the trend T t 
might be assumed to follow a simple random walk model or some extension such as the 
ARIMA(0,1,1) model (1 - B)T, = (1 - 6B)a t , or the AIRMA(0,2,2) model (1 - B) 2 T, = 
(1 — 01 Bit 6 2 B 2 )a t , while the seasonal component might be specified as a “seasonal ran¬ 
dom walk” (1 — B ]1 )S t = b r , where a t and b, are independent white noise processes. 

An appeal of this structural modeling approach, especially for seasonal adjustments 
and signal extraction, is that Kalman filtering and smoothing methods based on state- 
space formulations of the model, as discussed in Section 5.5, can be employed. The exact 
likelihood function can be constructed based on the state-space model form, as described 
in Section 7.4, and used for parameter estimation. The Kalman filtering and smoothing 
procedures can then be used to obtain estimates of the unobservable component series 
such as the trend { T t } and seasonal { S t } components, which are now included as elements 
within the state vector Y t in the general state-space model (5.5.4) and (5.5.5). 

Basic Structural Model. As a specific illustration, consider the basic structural model 
(BSM) for seasonal time series with period s as formulated by Harvey (1989). The model 
is defined by z t = T t + S t + e t , where T t follows the “local linear trend model” defined 
by 


T, = T t _ x + p t _ x +n t p,= + 6 (9.4.1) 

and S t follows the ‘‘dummy variable seasonal component model’ ’ defined by 

(14 -B + B 2 + ... + B s ~ 1 )S l =co t (9.4.2) 

where rj t , % t , w v and e t are mutually uncorrelated white noise processes with zero means 
and variances a 2 , ct 2 , er^, and a 2 , respectively. 

This local linear trend model is a stochastic generalization of the deterministic linear 
trend T, = a + fit, where a and p are constants. In (9.4.1), the effect of the random distur¬ 
bance rj t is to allow the level of the trend to shift up and down, while allows the slope 
to change. As special limiting cases, if <r 2 = 0, then p t = p t _ x and so p t is a fixed constant 

P for all t and the trend follows the random walk with drift (1 — B)T t = p 4- r] t . If a 2 = 0 
in addition, then (9.4.1) collapses to the deterministic model T t = T r _ l + p or T t = a + pt. 
The seasonal component model (9.4.2) requires the seasonal effects S t to sum to zero over 
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s consecutive values of a seasonal period, subject to a random disturbance with mean zero 
which allows the seasonal effects to change gradually over time. Again, a special limiting 
case of deterministic seasonal components with a fixed seasonal pattern about an average 
of zero, S, = S t _ s with S, + S t _j + ... + >S)_ i+ | = 0, occurs when er 2 = 0. Thus, one at¬ 
traction of a model such as (9.4.1) and (9.4.2) is that it generalizes a regression-type in 
which the trend is represented by a fixed straight line and the seasonality by fixed seasonal 
effects using indicator variables, by allowing the trend and seasonality to vary over time, 
and still yields the deterministic components as special limiting cases. 

We illustrate the state-space representation of the model (9.4.1) and (9.4.2) for the case 
of quarterly time series with s = 4. For this, we define the state vector as 


Y t =(T t ,P t ,S t ,S t _ l ,S t _ 2 y 


and let a r = 0/ p £ ( , co t )'. Then we have the transition equation 
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(9.4.3) 


or Y, = j + Ta,, together with the observation equation z t = T t + S t + e t = 
[1 0 1 0 0 ]Y t + e t = H Y t + e t . Hence, the variance component parameters of the 

structural model can be estimated by maximum likelihood methods using the state-space 
representation and innovations form of the likelihood function, as discussed in Sections 
5.5 and 7.4. Once these estimates are obtained, the desired optimal smoothed estimates 
Tj| n = E[T t \z l ,..., z n ] and S t | n = E[S t \z x ,..., z n \ of the trend and seasonal components 
based on the observed series Zj, ..., z n can readily be obtained by applying the Kalman 
filtering and smoothing techniques to the state-space representation. 


Relation to ARIMA Model. It should be noted from general results of Appendix A4.3 
that structural models such as the BSM have an equivalent ARIMA model representation, 
which is sometimes referred to as its reduced form in this context. For instance, the process 
T t defined by the local linear trend model (9.4.1) satisfies 

(1 - B) 2 T, = (1 - B)P,_ { + (1 - B)n, = + (1 - B), 7, 

It follows from Appendix A4.3.1 that + (1 — B)r] t can be represented as an MA(1) 
process (1 — QB)a t , so that (1 — B) 2 T t = (1 — 0B)a t andTj has the ARIMA(0, 2, 1) model 
as a reduced form. For another illustration, consider z t = T t + S t + N t , where it is assumed 
that 


(1 - B)T t = (1 - 6 T B)a, (1 - B ll )S, = (1 - & s B l2 )b, 


and N t = c t is white noise. Then, we have 
(1 -B)(l -B n )z t 

= (1 - B 12 )(l - 0 T B)a, + (1 - £)(1 - @ s B n )b, + (1 - B)( 1 - B n )c, 
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and according to the developments in Appendix A4.3, the right-hand-side expression above 
can be represented as the MA model (1 — 0\B — 0\ 2 B 12 — 0\ 3 B i2 )e t , where e, is white 
noise, since the right-hand side will have nonzero autocovariances only at the lags 0 , 1 , 
11, 12, and 13. Under additional structure, the MA operator could have the multiplicative 
form, but in general we see that the foregoing structural model, z t = T t + S t + N t , has an 
equivalent ARIMA model representation as 

(1 - B)( 1 - B n )z, = (1 - 0 l B - 6 n B n - 0 l3 B li ) £ , 

Example: Airline Data. Harvey (1989, Sec. 4.5) reported results of maximum likelihood 
estimation of the BSM defined by (9.4.1) and (9.4.2) for the logged monthly airline pas¬ 
senger data, using the data period from 1949 to 1958. The ML estimates were such that 
o 2 = 0 and a 2 was very small relative to a 2 and a 2 . The zero estimate et 2 = 0 implies 
that the model (9.4.1) for the trend T t reduces to the random walk with constant drift, 
(1 — B)T t = P + rj v while the seasonal component model is (1 + B + ... 4- B n )S t = co t . 
Differencing the series z t thus implies that 

w, = (1 - B)(l - B n )z, = (1 - B)( 1 - B n )T t + (1 - B)( 1 - B n )S, 

+ (1-B)(1-B 12 )e, 

= (1 - B n )r, t + (1 - B) 2 w, + (1 - B)( 1 - B 12 )e, 

It readily follows that the autocovariances of the differenced series w, = VV l2 " r for this 
model are 


Yo = 2er 2 + 6(7 1 + 4(J 2 

ri=~ <-2e 2 £ 

72 = (9.4.4) 

Yu =°l = 7 1 3 

2 o 2 

7 12 = -ff, - 2(7 £ 

and Yj = 0 otherwise. In particular, these give the autocorrelations 

= g . 2 + 2a l 

P ' 2 °l +rf + 

= 2(J e + 

PU ~ 2 ( 2*2 + ff 2 + 3 < 7 2 ) 


and p n = p 13 = <j 2 /[2{2o 2 + tf + 3 <j 2 )]. 

The autocorrelations calculated using estimates of the variance components given in 
Table 4.5.3 of Harvey (1989) are shown in Table 9.3 for the logged airline data. Also 
shown in Table 9.3 are the autocorrelations for the differenced series w t = VV 12 z, in 
the seasonal (0,1,1) X (0,1,1 ) 12 model. These were calculated from (9.2.18) using the 
parameter estimates 0 = 0.396,0 = 0.614, and a 2 = 1.34 X 10 -3 reported in Section 9.2.4. 
Table 9.3 shows a close agreement between the two sets of autocorrelations. Hence, for the 
logged airline data, both modeling approaches provide very similar representations of the 
basic trend and seasonality in the series. 
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TABLE 9.3 Comparison of the Autocorrelations of w t = W 12 z for the Basic Structural 
Model and the Seasonal ARIMA Model (0,1,1) x (0,1,1) 12 for Logged Airline Data 


Model 

Pi 

Pi 

Pi l 

Pn 

Pl3 

Basic structural model 

-0.26 

0.00 

0.12 

-0.49 

0.12 

ARIMA (0,1,1) X (0,1,1) 12 

-0.34 

0.00 

0.15 

-0.45 

0.15 


9.4.2 Deterministic Seasonal and Trend Components and Common Factors 

Now in some applications, particularly in the physical sciences, a seasonal or trend com¬ 
ponent could be nearly deterministic. For example, suppose the seasonal component can 
be approximated as 


S t = Po + 2 


j =i 


hj 


cos 


2 71 jt 
12 


+ P 2 j sin 


2 71 jt 
12 


where the p coefficients are constants. We note that this can be viewed as a special 
case of the previous examples, since S t satisfies (1 + B + B 2 + ... + B 11 )S t = 12/l (l or 
(1 — B l2 )S, = 0. Now, ignoring the trend component for the present and assuming that 
z, = S t + N t , where (1 - B n )S t = 0 and N t = (1 - 6 N B)a t , say, we find that z t follows 
the seasonal ARIMA model 


(1 - B n )z t = (1 - 0 n B)( 1 - B n )a t 

However, we now notice the presence of a commonfactor of 1 — B 12 in both the generalized 
AR operator and the MA operator of this model; equivalently, we might say that 0=1 
for the seasonal MA operator 0(B 12 ) = (1 - 0B 12 ). This is caused by and, in fact, is 
indicative of the presence of the deterministic seasonal component S, in the original form 
of the model. 

In general, the presence of deterministic seasonal or trend components in the structure 
of a time series z, is characterized by common factors of (1 — B s ) or (1 — B) in the 
generalized AR operator and the MA operator of the model. We can state the result 
more formally as follows. Suppose that z, follows the model cp(B)z, = 6 0 + 0( B)a t , and 
the operators cp(B) and 6(B) contain a common factor G(B), so that cp(B) = G(B)cp l (B) 
and 6(B) = G(B)6 X (B). Hence, the model is 

G(B) ( p l (B)z t =6 0 + G(B)6 l (B)a t (9.4.5) 

Let G(B) = 1 — g\B — ■■■ — g r B r and suppose that this polynomial has roots Gj" 1 ,.... G~ l 
which are distinct. Then, the common factor G(B) can be canceled from both sides of the 
above model, but a term of the form ^- =1 c,G' needs to be added. Thus, the model (9.4.5) 
can be expressed in the equivalent form as 

r 

<Pi(B)z t = % + ^ cfi\ + 6\(B)a t (9.4.6) 

i=t 

where the c t are constants, and c 0t is a term that satisfies G(B)c Qt = 6 (] . Modifications of 
the result for the case where some of the roots G~ l are repeated are straightforward. 
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Thus, it is seen that an equivalent representation for the above model is 

cp x (B)z t = x t + 9 l (B)a t 

where x t is a deterministic function of t that satisfies G{B)x t = 9 () . Note that roots in G(B) 
corresponding to “stationary factors,” such that |G, | < 1, will make a contribution to the 
component x t that is only transient and so negligible, and hence these terms may be ignored. 
Thus, only those factors whose roots correspond to nonstationary ‘ ‘differencing’ ’ and other 
“simplifying” operators, such as (1 — B) and (1 — B s ), with roots |G,| = 1 need to be in¬ 
cluded in the deterministic component x r These common factors will, of course, give rise to 
deterministic functions in x t that are of the form of polynomials, sine and cosine functions, 
and products of these, depending on the roots of the common factor G(B). 

Examples. For a few simple examples, the model (1 — B)z t = 9 Q + (1 — B)9 l (B)a t has an 
equivalent form z, = c l + 9 0 t + 9 1 (B)a t , which occurs upon cancellation of the common 
factor (1 — B), while the model (1 — \f?>B + B 2 )z t = 9 Q + (1 — \/3 B + B 2 )9 l (B)a t has 
an equivalent model form as z t = c 0 + Cj cos(2;rf/12) + c 2 sin(27rf/12) + 9 l (B)a l , where 

(l-V / 3 + Dc 0 = 00- 

Detection of a deterministic component such as x t above in a time series z t may occur 
after an ARIMA model is estimated and common or near-common factors are identified. 
Hence, the ARIMA time series methodology, in a sense, can indicate when a time series 
may contain deterministic seasonal or trend components. The presence of a deterministic 
component is characterized by a factor in the MA operator with roots on, or very near to the 
unit circle, which correspond to a differencing factor that has been applied to the original 
series in the formulation of the ARIMA model. When this situation occurs, the series is 
sometimes said to be ‘ ‘over-differenced’ ’. Formal tests for the presence of a unit root in the 
MA operator implying the presence of a deterministic component, have been developed by 
Saikkonen and Luukkonen (1993), Leybourne and McCabe (1994), and Tam and Reinsel 
(1997, 1998), among others. These tests can also be viewed as tests for unit roots in the 
generalized AR operator cp(B) in the sense that if one performs the differencing and then 
concludes that the MA operator does not have a unit root, then the unit root in the AR 
operator is supported. 

Deterministic components implied by the cancellation of factors could be estimated 
directly by a combination of regression models and ARIMA time series methods, as will 
be discussed in Section 9.5. An additional consequence of the presence of deterministic 
factors for forecasting is that at least some of the coefficients b^' in the general forecast 
function z t (l) for z f+/ in (5.3.3) will not be adaptive but will be deterministic (fixed) 
constants. Results such as those described above concerning the relationship between 
common factors in the generalized AR and the MA operators of ARIMA models and the 
presence of deterministic polynomial and sinusoidal components have been discussed by 
Abraham and Box (1978), Harvey (1981), and Bell (1987). 

9.4.3 Estimation of Unobserved Components in Structural Models 

A common problem of interest for the structural model is the estimation of the unobservable 
series S t from values of the observed series z t . We suppose that S t and z, are stationary 
processes with zero means and autocovariance functions y s (l) = F[.S’ r ,S’ f+[ ] and yjl) = 
E[z t z t+l ], and cross-covariance function y sz (l) = £’[.S',z r+1 ]. Then, specifically, suppose 
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we observe the values z,,t < r, and want to determine the linear filter 

00 

^ = (BK (9.4.7) 

u=0 

of {z t } such that the value .S’, is close to .S’, in the mean square error sense, that is, 
E[(S t — .S’,) 2 ] is a minimum among all possible linear filters. A typical model for which 
this problem arises is the ‘ ‘signal extraction’' model, in which there is a signal S t of interest, 
but what is observed is a noise-corrupted version of the signal so that 

z t = S, + N, 

where N t is a noise component. The problem then is to estimate values of the signal series S t 
given values on the observed series z t . Often, the filtering and smoothing algorithms for the 
state-space model, as discussed in Section 5.5.3, can be applied to this situation. However, 
while these algorithms are computationally attractive in practice, explicit expressions for 
the coefficients v u in (9.4.7) cannot usually be obtained directly from the state-space 
algorithms. These expressions can be derived more readily in the “classical” approach, 
which assumes that an infinite extent of observations is available for filtering or smoothing. 
This section provides a brief overview of some classical filtering and smoothing results 
that can be used to study the coefficients in (9.4.7). Typically, from a practical point of 
view, the classical results provide a good approximation to exact filtering and smoothing 
results that are based on a finite sample of observations z 1 ,..., z„. 

Smoothing and Filtering for Time Series. We suppose that {z,) has the infinite MA 
representation 


z, = i g(B)a t = ^ VjOt-j 
j =o 


where the a t are white noise with variance c 2 . Also, let g, s (B) = YiJL-co fzsbe the 
cross-covariance generating function between z t and .S’,. Then, it can be derived (e.g., 
Whittle, 1963, Chapters 5 and 6; Priestley, 1981, Chapter 10) that the optimal linear filter 
for the estimate S t = v ^ z r-u = n ( ' T ^{B)z T , where v^\B) = v ^B u , is given by 


v (r \B) = 


1 


oJwiB) 


B^’gJB) 


V(B~') 


(9.4.8) 


Here, for a general operator v(B) = Y.'JL-cx, °j B 1 - the notation [// B)] + is used to denote 

zr = 0 » j bj - 

To derive the result (9.4.8) for the optimal linear filter, note that, since z, = t//( B )a n the 
linear filter can be expressed as 

S, = d {t \B)z t = v {T \B)yr(B)a r = h (r) (B)a T 

where h^\B) = v m (B)i//(B) = h (t) BJ. Then, we can determine the coefficients h <T> 

to minimize the mean squared error £[(A r — A,) 2 ] = £[(A, — Since the 

{o r } are mutually uncorrelated, by standard linear least-squares arguments the values of 
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the coefficients that minimize this mean squared error are 

( cov[a S f ] Y (j + t-T) 

n = -=- / > 0 

1 var [a T _j] o 2 a 

Hence, the optimal linear filter is 

00 

h (t \B) = 4 y Yjj + t - T)B> = \[B T -‘g as (B)] + (9.4.9) 

a a °a 

where g as (B) denotes the cross-covariance generating function between a t and S t . 
Also, note that y zs (j) = cov[£” 0 Vflt-i, S t+j ] = ViYasO + so follows that 
g zs (B ) = i i/(B~ l )g as (B). Therefore, the optimal linear filter in (9.4.9) is h il HB) = 
(\/g 7S (B)/\j/{B~ x )] + , and, hence, the optimal filter in terms of S, = v^ T \B)z r 
is v (t> (B) = h (r) (B)/ii/(B). which yields the result (9.4.8). The mean squared error of the 
optimal filter, since S t = is easily seen from the above derivation to be 


E[(S t - S,) 2 ] = E[S 2 ] - E[S 2 ] = var [S t ] - a 2 a ^ {hf? 

i =o 


In the smoothing case where r = + 00 , that is, we estimate S t based on the infinite record 
of observations z u , — 00 < u < 00 , by a linear filter S t = v u z t-u = v(B)z t , the result 

(9.4.8) for the optimal filter reduces to 


Szz ^ B ) (y 2 a w(B)w(B !) 


(9.4.10) 


For the signal extraction problem, we have z, = S t + N t , where it is usually assumed that 
the signal { S t } and the noise process { N t } are independent. Thus, in this case we have 
g zs (B) = g ss (B ), and so in the smoothing case r = + 00 , we have v(B) = g ss (B)/g zz (B) or 
v(B) = g ss (B)/[g ss (B) + g nn (B)]. 


Smoothing Relations for the Signal Plus Noise or Structural Components Model. The 
preceding results can be applied specifically to the model z, = S, + N t , where we assume 
that the signal process { S t } and the noise process { N ,} are independent and satisfy ARMA 
models, (f> s (B)S t = 0 s (B)b t and 4> n (B)N t = 9 n (B)c t , where b t and c t are independent white 
noise processes with variances o 2 and a 2 . It follows from Appendix A4.3 that the observed 
process z, also satisfies an ARMA model <p(B)z l = 9(B)a t , where <p(B) = 4> s (B)4> n (B), 
assuming no common factors in the AR operators. It then follows that the optimal lin¬ 
ear “smoother” S, = Y^^L- 00 v u z t-u = v ( B ) z t of S t , based on the infinite set of values 
z u , —00 < u < 00 , has a filter given by 


g„(B) <J 2 b f(B)f(B-')9 s {B)9 s {B-') 

SzJB) ~ o 2 9{B)9{B~ x )(j) s (B)(j) s {B~ l ) 


(9.4.11) 


In practice, since the series S t and N t are not observable, the models for S t and N t would 
usually not be known. Thus, the optimal filter would not be known in practice. However, by 
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developing a model for the observed series z, and placing certain restrictions on the form of 
the models for S t and N, beyond those implied by the model for z, t , e.g., by assuming N t is 
white noise with the largest possible variance, one may obtain reasonable approximations 
to the optimal filter v(B). While optimal smoothing results, such as (9.4.10), have been 
derived for the case where S, and N t are stationary processes, Bell (1984) showed that the 
results extend to the nonstationary case under reasonable assumptions for the nonstationary 
signal S t and noise N t processes. 

As noted earlier, an alternative to the classical filtering approach in the structural com¬ 
ponents models is to express the model in state-space form and use Kalman filtering and 
smoothing techniques to estimate the components, as illustrated, for example, by Kitagawa 
and Gersch (1984). For further discussion of this approach, see also Harvey (1989) and 
Durbin and Koopman (2012). 

Seasonal Adjustments. The filtering and smoothing methods described above have appli¬ 
cations to seasonal adjustments of economic and business time series (i.e., estimating and 
removing the seasonal component from the series). Approaches of the type discussed were 
used by Hillmer and Tiao (1982) to decompose a time series uniquely into mutually inde¬ 
pendent seasonal, trend, and irregular components. A model-based approach to seasonal 
adjustments was also considered by Cleveland and Tiao (1976). Seasonal adjustments are 
commonly performed by statistical agencies in the U.S. and elsewhere, and the methods 
used have received considerable attention in the literature. For an overview and further dis¬ 
cussion, see, for example, Ghysels and Osborn (2001, Chapter 4), Bell and Sotiris (2010), 
Chu, Tiao, and Bell (2012), and Bell, Chu, and Tiao (2012). 

9.5 REGRESSION MODELS WITH TIME SERIES ERROR TERMS 

The previous discussion of deterministic components in Section 9.4.2 motivates considera¬ 
tion of time series models that include regression terms such as deterministic sine and cosine 
functions to represent seasonal behavior or stochastic predictor variables, in addition to a 
serially correlated “noise” or error term. We will assume that the noise series N t follows 
a stationary ARMA process; otherwise, differencing may be need to be considered. Thus, 
letting w t be a “response” series of interest, we wish to represent w t in terms of its linear 
dependence on k explanatory or predictor time series variables x n ,..., x tk as follows: 

w t = P l x n +P 2 x t 2 + - + Pk x tk + N t t=l,...,n (9.5.1) 

where the errors N, follow a zero-mean ARMA(p, q) model, </>( B ) N t = 9(B)a r The tradi¬ 
tional linear regression model was reviewed briefly in Appendix A7.2. Using similar nota¬ 
tions with w = (itq,..., w n )' , N = (IVj,..., N n )', and p = , P k )', the model (9.5.1) 

may be written in matrix form as w = X/J + N . and with covariance matrix V = covj N |. 
In the standard regression model, the errors N t are assumed to be uncorrelated with 
common variance a 2 N , so that V = c^I, and the ordinary least squares (LS) estimator 

P = (X r X) ! X'w has well-known properties such as cov[/J] = n 2 ^(X'Xr 1 • However, in 
the case of autocorrelated errors, this property no longer holds and the ordinary least-squares 
estimator has covariance matrix 

co v[p] = (X'X)- 1 X , VX(X , X)“ 1 

Moreover, standard inference procedures based on the t and F distributions are no longer 
valid due to the lack of independence. 
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When cov| N | = V / cr^I, the best linear unbiased estimator of ft is the generalized 
least-squares (GLS) estimator given by 

P G = (X'V _1 X) _1 X'Y _1 w (9.5.2) 

which has cov[/J G ] = (X'V^X) -1 . The estimator f) G is the best linear unbiased estimator 
in the sense that var[c'/l G ] is a minimum among all possible linear unbiased estimators 
of p. for every arbitrary k-dimensional vector of constants c' = (c h ..., c k ); in particular, 
var[c'/J G ] < var [c' P) holds relative to the ordinary LS estimator p. It follows that p G 
in (9.5.2) is the estimate of p obtained by minimizing the generalized sum of squares 
S(p ; V) = (w — XP)'\~ l ( w — XP) with V given. This estimator also corresponds to the 
maximum likelihood estimator under the assumption of normality of the errors when the 
covariance matrix V is known. Of course, a practical limitation to use of the GLS estimate 
P G is that the ARMA noise model and its parameters <p and 6 needed to determine V must 
be known, which is typically not true in practice. This motivates an iterative model building 
and estimation procedure discussed below. 

9.5.1 Model Building, Estimation, and Forecasting Procedures for Regression 
Models 

When a regression model is fitted to time series data, one should always consider the 
possibility that the errors are autocorrelated. Often, a reasonable approach to identify an 
appropriate model for the error N t is first to obtain the least-squares estimate p, and then 
compute the corresponding regression model residuals 

N, = w t - P x x n - p 2 x, 2 - p k x tk (9.5.3) 

This residual series can be examined by the usual time series methods, such as inspection 
of its sample ACF and PACF, to identify an appropriate ARMA model for N r This would 
typically be adequate to specify a tentative model for the error term N t , especially when 
the explanatory variables x ti are deterministic functions such as sine and cosine functions, 
or polynomial terms. In such cases, it is known (e.g., Anderson, 1971, Section 10.2) that 
the least-squares estimator for p is an asymptotically efficient estimator relative to the 
best linear estimator. In addition, it is known that the sample autocorrelations and partial 
autocorrelations calculated using the residuals from the preliminary least-squares fit are 
asymptotically equivalent to those obtained from the actual noise series N t (e.g., Anderson, 
1971, Section 10.3; Fuller, 1996, Section 9.3). 

Hence, the complete model that we consider is 

w, = x[p + N t </>(£)(l - B) d N, = d(B)a t t = \,...,n (9.5.4) 

where x t = (x rl ,..., x tk )'. Estimates of all parameters can be obtained by maximum like¬ 
lihood methods. The resulting estimate for p has the GLS form 

P G = (X'V _1 X) 'x'V^w 

but where V is replaced by the estimate V obtained from the MLEs <p \,..., <p p , J) q 
of the ARMA parameters for N t . Also, cov[/J G ] ~ (X'V 'x) -1 . The estimation can be 
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performed iteratively, alternating between calculation of p G for given estimates ([> and 0, 
and reestimation of $ and 6 , given P G and the estimated noise series N t = w t — x'/J G . 

Transformed Model. With the ARMA model specified for N t , the computation of the 
generalized least-squares estimator of ft can be carried out in a computationally convenient 
manner as follows. Let P' be a lower triangular matrix, such that P'VP = rx 2 I, that is, 

y 1 = PP / 

/o 2 . Then, as in Appendix A7.2.5, the GLS estimator can be obtained from the 
transformed regression model 

P'w = P'X/3 + P'N (9.5.5) 

orw* = X *p + a, where the transformed variables are w* = P'w, X* = P'X, and a = P r N. 
Since the covariance matrix of the error vector a = P ’ N in the transformed model is 

cov[a] = P'cov[iV]P = P'VP = o 2 \ 

we can now use ordinary least-squares to estimate p in the transformed model. That is, the 
GLS estimator of p is obtained as the LS estimator in terms of the transformed variables 
w* and X* as 

p G = (X*'X*)~ l X*'w* with cov[/? G j = o- 2 (X*'X *) -1 (9.5.6) 

However, since the ARMA parameters for N, are not known in practice, one must 
still iterate between the computation of p G using the current estimates of <p and 0 to 
form the transformation matrix P , and estimation of the ARMA parameters based on 
N, = w t — x' t P G constructed from the current estimate of p. The computational procedure 
used to determine the exact sum-of-squares function for the specified ARMA model will 
also essentially determine the nature of the transformation matrix P'. For instance, the 
innovations algorithm described in Section 7.4 gives the sum of squares for an ARMA 
model as X($, 6) = (T“w'V _1 w = e'D _l e, where e = G _ 1 L 0 W and D = diagtiq,..., v n ), 
and G and L^ are specific lower triangular matrices. Hence, the innovations algo¬ 
rithm can be viewed as providing the transformation matrix P' = D _ 1 / 2 G _ 1 L 0 such that 
w* = D“'/ 2 G _ 1 L^w = P'w has covariance matrix of the “standard” form 

cov[w*l = P'cov[w]P = D“ 1 / 2 G“ 1 L 0 cov[w]L^G ,_1 D _ 1 / 2 = a 2 l 

Therefore, the required transformed variables w* = P'w and X* = P'X in (9.5.6) can 
be obtained by applying the innovations algorithm recursive calculations (e.g.,(7.4.9)) to 
the series w = (itq,..., w n )' and to each column, x' = (x u , ..., x ni )', i = 1 ,..., k, of the 
matrix X. 

Example. We take the simple example of an AR(1) model, (1 — <pB)N l = a t , for the noise 
N t , for illustration. Then the covariance matrix Y of N has (i,y)th element given by 
Yj_j = /(I — (fr). The n x n matrix P' such that P'VP = c 2 1 has its (1, 1) element 

9 1/2 

equal to ( 1 — (jr) , its remaining diagonal elements equal to 1 , its first subdiagonal 
elements equal to — <fi, and all remaining elements equal to zero. Hence, the transformed 
variables are w* = (1 — <p 2 ) l ' 2 w [ and w* = w t — 4>w t _\, t = 2 ,3,... ,n, and similarly for 
the transformed explanatory variables x* In effect, with AR(1) errors, the original model 
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(9.5.1) has been transformed by applying the AR(1) operator (1 — <pB) throughout the 
equation to obtain 

W, - (pW t _ X = p\(x t \ - </>X r _ U ) + P 2 (x t 2 - <l>Xt-l,2) + •" 

+ Pk( x tk ~ 4>x t _ l k ) + a, (9.5.7) 

or, equivalently, w* = P\X* + P 2 X * 2 + + Pk x * k + a r> where the errors a, now are uncor¬ 

related. Thus, ordinary least-squares applies to the transformed regression model, and the 
resulting estimator is the same as the GLS estimator in the original regression model. 

Generalization of the transformation procedure to higher order AR models is 
straightforward. Apart from special treatment for the initial p observations, the trans¬ 
formed variables are w* = cp(B)w, = w t - (p x w t _ x - ■■■ - cp p w t _ p and x* = cp(B)x ti = 
x tj — (p | | j — ■■■ — (p p x t _ p i = 1, ..., k. The exact form of the transformation in the 

case of mixed ARMA models will be more complicated [an approximate form is 
w* ~ 9~ l (B)(p(B)iu l , and so on] but can be determined through the same procedure as 
is used to construct the exact sum-of-squares function for the ARMA model. 

Forecasting. Forecasting for regression models with time series errors is straightforward 
when future values x t+l ( of the explanatory variables are known, as would be the case 
for deterministic functions such as sine and cosine functions, for example. Then, based on 
forecast origin t, the lead / forecast of 


w t+i ~ P\ x t+i,\ + •" + Pk x t+i,k + N t+i 
based on past values through time t, is 

W t (l) = P\X t+ll + p2 x t+l,2 + ■" + Pk x t+l.k + N t (J) (9.5.8) 

where N t (I ) is the usual /-step-ahead forecast of N l+I from the ARMA(p, q) model, 
< p(B)N t = 9(B)a t , based on the past values of the noise series N t . The forecast error 
is 

/-l 

e,(l) = w t+ , - w,(l) = N t+I - N,(l ) = ^ Wia t+ i-i (9.5.9) 

;=o 

withK(/) = var[e t (/)| = Just the forecast error and its variance from the ARMA 

model for the noise series N t , where the i//, are the coefficients in i//( B ) = cp~ l (B)9(B) for 
the noise model. 


Example. For the model 

w t = A) + A c °s ( ) + h sin ( jr) + N t 

where (1 — (pB)N t = a t , the forecasts are 


= Po + Pi cos 


2 n(t + /) 
12 


+ Pi sin 


2 nit + /) 
12 


+ N t (l) 


with N,(l) = (p 1 N t . Note that these forecasts are similar in functional form to those that 
would be obtained in an ARMA(1, 3) model (with zero constant term) for the series 
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(1 - B)( 1 - V3 B + B 2 )w, , except that the [) coefficients in the forecast function for the 
regression model case are deterministic, not adaptive, as was noted at the end of Section 
9.4.2. 

In practice, estimates of fi and the time series model parameters would be used to obtain 
the estimated noise series N, from which forecasts of future values would be made. The 
effect of parameter estimation errors on the variance of the corresponding forecast error was 
investigated by Baillie (1979) for regression models with autoregressive errors, generalizing 
a similar study by Yamamoto (1976) conducted for pure autoregressive models. 

More detailed discussions of regression analysis with time series errors are given by 
Harvey and Phillips (1979) and by Wincek and Reinsel (1986), who also consider the 
possibility of missing data. A state-space approach with associated Kalman filtering calcu¬ 
lations, as discussed in Section 7.4, can be employed for the regression model with time 
series errors, and this corresponds to one particular choice for the transformation matrix 
P' in the above discussion. A specific application of the use of regression models with 
time series errors to model calendar effects in seasonal time series was given by Bell and 
Hillmer (1983), while Reinsel and Tiao (1987) used regression models with time series 
errors to model atmospheric ozone data for estimation of trends. 

One common application of regression models for seasonal time series is where season¬ 
ality can be modeled as a deterministic seasonal mean model. Then, for monthly seasonal 
data, for example, we might consider a model of the form 



where N t is modeled as an ARIMA process. As an example, Reinsel and Tiao (1987) 
consider the time series z t of monthly averages of atmospheric total column ozone measured 
at the station Aspendale, Australia, for the period from 1958 to 1984. This series is highly 
seasonal, and so in terms of ARIMA modeling, the seasonal differences w, = (1 — B ] -)z n 
were considered. Based on the sample ACF and PACF of w t , the following model was 
specified and estimated, 


(1 - 0.48B - 0.22B 2 )(1 - B l2 )z, = (1 - 0.99 B n )a, 

and the model was found to be adequate. We see that this model contains a near-common 
seasonal difference factor (1 — B n ), and consequently, it is equivalent to the model that 
contains a deterministic seasonal component, z t = S t + N r of exactly the form given 
in (9.5.10), and where N t follows the AR(2) model, (1 — 0.48B — 0.22B 2 )N t = a t . This 
model was estimated using regression methods similar to those discussed above. 

Sometimes, the effects of a predictor variable { x t } on z, t are not confined to a single time 
period t, but the effects are more dynamic over time and are “distributed” over several 
time periods. With a single predictor variable, this would lead to models of the form 


z t ~ A) + P\x t + Pi x t-\ + Pi x t-2 + "■ + N, 

where N t might be an ARIMA process. For parsimonious modeling, the regression coeffi¬ 
cients /), can be formulated as specific functions of a small number or unknown parameters. 
Such models are referred to as transfer function models or dynamic regression models, and 
will be considered in detail in Chapters 11 and 12. 
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Remark. Note that regression models with autocorrelated errors can be fitted to data using 
the arima() function in R with an argument xreg added to account for regression terms; 
type help(arima) for details. For further discussion, see also Venables and Ripley (2002). 
An alternative available in the MTS package of R is the function tf m 1 () that can be used 
to fit a regression model with a single input variable X t . We demonstrate the use of this 
function to develop a dynamic regression model in Chapter 12. A similar function which 
allows for two input series is also available in the MTS package of R. 


9.5.2 Restricted Maximum Likelihood Estimation for Regression Models 

A detracting feature of the maximum likelihood estimator (MLE) of the ARMA parameters 
in the linear regression model (9.5.1) is that the MLE can produce a nonnegligible bias for 
small to moderate sample sizes. This bias could have significant impact on inferences of the 
regression parameters (S based on the GLS estimation, through the approximation 
cov[/J c ] ~ (XW -1 X) -1 , where V involves the ML estimates of the ARMA parameters. One 
“preventive” approach for reducing the bias is to use the restricted maximum likelihood 
(REML) estimation procedure, also known as the residual maximum likelihood estimation 
procedure, for the ARMA model parameters. 

The REML method has been popular and commonly used in the estimation of variance 
components in mixed-effects linear models. For ARMA models, this procedure has been 
used by Cooper and Thompson (1977) and Tunnicliffe Wilson (1989), among others. 
Cheang and Reinsel (2000, 2003) compared the ML and REML estimation methods, and 
bias characteristics in particular, for time series regression models with AR and ARMA 
noise (as well as fractional ARIMA noise, see Section 10.4). They established approximate 
bias characteristics for these estimators, and confirmed empirically that REML typically 
reduces the bias substantially over ML estimation. Consequently, the REML approach 
leads to more accurate inferences about the regression parameters. 

The REML estimation of the parameters in the ARMA noise models differs from the 
ML estimation in that it explicitly takes into account the fact that the regression parameters 
P are unknown and must be estimated (i.e., estimation of ARMA parameters relies on 
the residuals N t = w t — x' t P G rather than the “true” noise N, = w, — x' t P). In the REML 
estimation method, the estimates of d>, 6 , and a 1 are determined so as to maximize the 
restricted likelihood function. This is the likelihood function based on observation of 
the ‘ ‘residual vector’ ’ of error contrasts u = II , \v only, whose distribution is free of the 
regression parameters p. rather than the likelihood based on the ‘full’ vector of observations 
w. Here, H r is any (n — k)X n full rank matrix such that H'X = 0, so the regression effects 
are eliminated in u = iLw and its distribution is free of the parameters p. 

Assuming normality, the distribution of w is normal with mean vector E( w) = X/l and 
covariance matrix cov[w] = V, which we write as V = <7 2 V* for convenience of notation. 
Then, u = iLw has normal distribution with zero mean vector and covariance matrix 
cov[u] = <7 2 H'V*H. Thus, the likelihood of 0, 0, and based on u, that is, the density of 
u, is 


p(u\(P.e.c7 2 a ) = (2^) ( " * )/2 |H'V„Hr 1/2 exp 


--i-u'tH'V.H)- 1 !! 

2 °a 
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It has been established (e.g., Harville, 1974, 1977), however, that this likelihood (i.e., 
density) can be expressed in an equivalent form that does not involve the particular choice 
of error contrast matrix H' as 


where 


£*(</>• 0,0%) = P( u|0. ° 2 a ) 

= (2^r ( "" fc)/2 ix'xi 1/2 iv*r l/2 


X|XX ! X| 1/2 exp 


-A s(p G ,4>,e) 

2 °a 


(9.5.11) 


S(P G , <p,0) = (w - Xfc/v; 1 (w - xp G ) 

= w'tv; 1 - v; 1 x(xX'x)' 1 x'v; 1 )w 

and P G = (X'V^ 1 X) ' X'V^ 1 w. Evaluation of the restricted likelihood (9.5.11) requires 
little additional computational effort beyond that of the “full” likelihood, only the ad¬ 
ditional factor ix'V^ 1 X|. Therefore, numerical determination of the REML estimates of 
<p, 0, and <r^ is very similar to methods for ML estimation of the ARMA model parameters. 
However, one difference is that the REML estimate of takes into account the loss in 
degrees of freedom that results from estimating the regression parameters and is given 
by a 2 = S(P G , (/), 0)/(n — k ) as opposed to S(fl G , (f>, 0)/n for the ML estimate, although 
arguments can be put forth for use of the divisor n — k — p — q rather than n — k in the 
REML estimate a 2 . For further discussion and details related to REML estimation, see 
Tunnicliffe Wilson (1989) and Cheang and Reinsel (2000, 2003). 


APPENDIX A9.1 AUTOCOVARIANCES FOR SOME SEASONAL MODELS 


See the following Table A9.1: 



TABLE A9.1 Autocovariances for Some Seasonal Models 

Model (Autocovariances of w,)/a 2 


y o = (l+0 2 )(l+© 2 ) 
7\ = -(1 + © 2 ) 

r_, = e& 

7 S = -0(1 + 0 2 ) 

7 S *1 = r,.\ 


(.\)w, = {\-6B){\-0B s )a, 

w, = a, — 9a,_ [ — 0a,_ s + 00a,_ s _, 
s > 3 


(2) (1 - <t>B 3 )w, = (/ - 9B)( 1 - 0B s )a, 
ii>, - = a, - 0a,_, - 0a, + 0©a,_ 


(3)u;, = (l-0,B-l9 2 B 2 ) 

X(1 -0,B' -0 2 B 2, )a, 
tv, = a, - 0,a,_, - 0 2 a ( _ 2 ~ ®i a /-s 
+0 1 0 1 a,_ i _| +02©|a,_j_ 2 
~®2 a i-2, + ^l®2 a /-2s-l 
+ 0 2 0 2 a »-2,-2 


All other autocovanances are zero. 

7o = a + * 2 ) [1 + <« - o> 2 X 0 - <t> 2 )-'] 
y, = -0 [l +(0-<l>) 2 X(1 — <D 2 ) -1 ] 
y,_, = 0 [0 - O - 0(0 - O) 2 X (1 - <t> 2 )-'] 

7s = _(1 + 0 2 ) [0 - <J> - <t>(0 - O) 2 X (1 - O 2 )- 1 ] 

7,*\ = 7 S -1 

7j =^7 ) -sJ Zs + 2 

For s > 4, y 2 , y 3 ,..., y,_ 2 are all zero. 

y o = (l+0 2 +0 2 )(l + ©| + ©j) 
y, =-0,(l-0 2 )(l+e 2 +^) 
y 2 = — 0 2 (1 + ®| + 
y s _ 2 = 02 ®iO -® 2 ) 

7 S -1 =^0,(1-02X1-02) 

7, = -e,(l-© 2 )(l+0j + 0|) 

7 S * i = 

?*+2 = /j-2 

72s-2 = #2®2 

72 s -\ =0,(l—02)©2 

y 2i = - 02 (l + 0?+0 2 2 ) 

/2, + l = '/2.-I 


Special Characteristics 

(a) = 7 S * i 

(b) = /> J+1 = p, p. 


(a) y s _i = y J+ i 

(b) yj = <t>Yj- s j > s + 2 


( a ) r,- 2 = 7 S *2 

(b) 7,-i = 7 ,*i 

(c) y,_ 2 = 7ls*2 

( d ) fts-i = J'fc+i 
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TABLE A9.1 ( continued ) 

Model 

(Autocovariances of w,)/o 2 

Special Characteristics 


Ylt+2 = /2j-2 

All other autocovariances are zero. 


(3a) Special case of model 3 

w, = (1 - 0, B - 0 2 B 2 )(l - &B s )a, 
w, = a, — 0 i a,_ l — 02 a t -2 ~ ©a, 

+0i©a,_ 3 _i + 0 2 @a,_,_ 2 

s > 5 

y o = (l+^+0 2 2 )(l+© 2 ) 

Yl =-0,(l-0 2 )(l+© 2 ) 

/ 2 = -0 2 (l+© 2 ) 
r 3 _ 2 = 0 : © 

r,., =0,(i-0 2 > e 
yj = -©(l+0f+0f) 
y,+i = y,-i 
r» 2 = r,. 2 

All other autocovariances are zero. 

(a) Y ,-2 = r , + 2 
(Wr-i =y, +1 

(3b) Special case of mode! 3 

w, = (1 - 0B)( 1 - 0, B’ - Q 2 B 2, )a, 
u>, = a, - 0a,_, - ©(<*,_, + 

- 0 2 a »-2* + ee 2 a '-2,-\ 

s > 3 

K, = (1+# 2 X1+©!+©;) 

/, = -0(1 + ©^ + ©*) 
r,. 1 =00,(1-0 2 ) 
y^-e^i-ejXi+e 2 ) 

Ys+i = y,-i 
r^-i = ^©2 
y2j = -0 2 d +0 2 ) 

72j+1 = /2j-I 

All other autocovariances are zero. 

(a) y,_, = y J+l 

0>) ^-1 = ^2j+1 

(4) w, = (1 -0 t B- 0 S B 1 - 0, +l B s+l )a, 
w, = a, - 0 x a ,_, - 0,a,_, - 0, +J 

j > 3 

r 0 = 1 + 0\ + 0 2 + 0 2 +l 
y, = -0, + 0,0 J+1 

r,-\ = 0|0, 

y, = -0, + 0,0 J+l 

All other autocovariances are zero 

(a) In general. 

Y,.\ * Ys +1 

YiY, £ y,+1 
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TABLE A9.1 ( continued) 
Model 


(4a) Special case of model 4 
w, = (\-0 l B-6,B , )a, 
W,= a,- 9,a,_, - 6,a,_, 
s> 3 


(5) (1 - = (1 -6,B- 9,B 3 

w, - Qw,_, = a, — 0| <a,_, — 9,a,_, 


s > 3 


(5a) Special case of model 5 

(1 - <&B 3 )w, = (1 - 6,B - 6,B s )a, 
w, - = a, - 6,a ,_| — 9,a,_, 


5^3 


(Autocovariances of w,)/a 2 


r 0 = i + +e\ 
h = ~ e \ 
y _, =9,6, 

(, = -9, 

All other autocovariances are zero. 

(0,-<bf (0 J+I +0,<D) : 


= 1 + 6 } + 


1 -<t> : 1 - <t> 2 

(9,-<t>)(9, + , +0 |Q) 


Y,-\ = ( 9, - <t>) + <l> 

- <i>) fi - <t>y 


9 3+l +9,<t> - 

1 _ 4>2 



+(0, + i + 0,<t>) |o, +d>^Y 

[ 0 - 0 ' 

Yj =<S>Yj-J >5 + 2 

For s > 4, y 2 ,..., y,_ 2 are all zero. 


6\ + (9, - <t>? 

*> = 1+ " .-<* 

Yi = ~9, 

9,(6,-0) 
r *~' ~ | -<p- 

<t>0j -(6, -<D)(1 -Q9,) 


Yj =4 >Yj-sJ > s + 1 

For s > 4, y 2 , _ y,_ 2 are all zero. 



Special Characteristics 
(a) Unlike model 4, 

/ J+ ,=o 


(a) Ys- t * Ys*i 

(b) yj = j>s + 2 


(a) Unlike model 5, 

iWi - *>r, 
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EXERCISES 

9 . 1 . Show that the seasonal difference operator 1 — B n , often useful in the analysis of 
monthly data, may be factorized as follows: 


(1 - B n ) = (1 + B)( 1 - \/3 B + B 2 )( 1 -B + B 2 )( 1 + B 2 )( 1 +B + B 2 ) 
X (1 + \/35 + 5 2 )(l - B) 


Plot the zeros of this expression in the unit circle and show by actual numerical 
calculation and plotting of the results that the factors in the order given above 
correspond to sinusoids with frequencies (in cycles per year) of 6,5,4,3,2,1, together 
with a constant term. [For example, the difference equation (1 — B + B 2 )x, = 0 with 
arbitrary starting values = 0, x 2 = 1 yields x 3 = l,x 4 = 0, x 5 = —1, and so on, 
generating a sine wave of frequency 2 cycles per year.] 

9 . 2 . A method that has sometimes been used for “deseasonalizing” monthly time series 
employs an equally weighted 12-month moving average: 

1 , 

z t = + z t-i + •" + -Vtt) 

(a) Using the decomposition (1 - B l2 ) = (1 - _B)(1 + B + B 2 + ■■■ + B 11 ), show 
that 12(z r -z,_j) = (1 - B l2 )z t . 

(b) The exceedance for a given month over the previous moving average may be 
computed as z t — z f _j. A quantity u t may then be calculated that compares the 
current exceedance with the average of similar monthly exceedances experienced 
over the last k years. Show that u t may be written as 


u t = 



B 1 -B u \ 
12 1 - B ) 



B 12 1 - B nk \ 
k 1-R12 ) 


Zt 


9 . 3 . It has been shown (Tiao et ah, 1975) that monthly averages for the (smog-producing) 
oxidant level in Azusa, California, may be represented by the model 

(1 - B l2 )z, = (1 + 0.2,B)(1 - 0.95 12 )a r a 2 = 1.0 

(a) Compute and plot the xj/j weights of this model. 

(b) Compute and plot the jij weights of this model. 

(c) Calculate the standard deviations of the forecast errors 3 months and 12 months 
ahead. 

(d) Obtain the eventual forecast function. 
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9.4. The monthly oxidant averages in parts per hundred million in Azusa from January 
1969 to December 1972 were as follows: 



Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1969 

2.1 

2.6 

4.1 

3.9 

6.7 

5.1 

7.8 

9.3 

7.5 

4.1 

2.9 

2.6 

1970 

2.0 

3.2 

3.7 

4.5 

6.1 

6.5 

8.7 

9.1 

8.1 

4.9 

3.6 

2.0 

1971 

2.4 

3.3 

3.3 

4.0 

3.6 

6.2 

7.7 

6.8 

5.8 

4.1 

3.0 

1.6 

1972 

1.9 

3.0 

4.5 

4.2 

4.8 

5.7 

7.1 

4.8 

4.2 

2.3 

2.1 

1.6 


Using the model of Exercise 9.3, compute the forecasts for the next 24 months. 
(Approximate unknown a's by zeros.) 

9.5. Thompson and Tiao (1971) have shown that the outward station movements of 
telephones (logged data) in Wisconsin are well represented by the model 

(1 - 0.5.B 3 )(1 - B n )z t = (1 - 0.2 B 9 - 0.3 B n - 0.2 B ri )a t 

Obtain and plot the autocorrelation function of w t = (1 — B l2 )z t for lags 1,2,..., 24. 

9.6. Consider the airline series analyzed earlier in this chapter. We have seen that the 
logarithm of the series is well represented by the multiplicative model w, = (1 — 
6B){\-@ n B n )a t 

(a) Compute and plot the 36-step-ahead forecasts and associated ±2 forecast error 
limits for the logged series. 

(b) Use the results in part (a) to obtain 12-step-ahead forecasts and associated forecast 
error limits for the original series. Plot the results. 

9.7. Quarterly earnings per share of the U.S. company Johnson & Johnson are available 
for the period 1960-1980 as series ’JohnsonJohnson’ in the R datasets package. 

(a) Plot the time series using the graphics capabilities in R. 

(b) Determine a variance stabilizing transformation for the series. 

(c) Plot the autocorrelation functions and identify a suitable model (or models) for 
the series. 

(d) Estimate the parameters of the model (or models) identified in part (c) and assess 
the statistical significance of the estimated parameters. 

(e) Perform diagnostic checks to determine the adequacy of the fitted model. 

(f) Compute and plot the /-step-ahead forecasts and associated two-standard-error 
prediction limits, / = 1,... ,4, for this series. 

9.8. Monthly Mauna Loa atmospheric C0 2 concentration readings for the period 
1959-1997 are available as series ‘co 2 ’ in the R datasets package. 

(a) Plot the time series and comment on the pattern in the data. 

(b) Examine the autocorrelation structure and develop a suitable time series model 
for this series. 

(c) Compute and plot the 12-step-ahead forecasts and associated two-standard-error 
prediction limits. 
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9.9. A time series representing the total monthly electricity generated in the United States 
(in millions of kilowatt-hours) for the period January 1970 to December 2005 is 
available as series ‘electricity’ in the R TSA package. 

(a) Plot the series and comment. Is a variance stabilizing transformation needed for 
this case? 

(b) Determine a suitable model for the series following the iterative three-stage pro¬ 
cedure of model identification, parameter estimation, and diagnostics checking. 

(c) Is there evidence of a deterministic seasonal pattern in this series? If so, how 
would this impact your choice of model for this series? 

9.10. Consider the time series model w, = + N t where N t follows the AR(1) model 

N t = 4>N + a t . Assume that a series of length n is available for analysis. 

(a) Assuming that the parameter </> is known, derive the generalized least-squares 
estimator of the constant /J (l in this model. 

(b) Repeat the derivation in part (a) assuming that N t follows the seasonal AR model 
N, = </>4 N f _ 4 + a t . 

9.11. Suppose the quarterly seasonal process {z,} is represented as z t = S t + a lt , where 
S t follows a “seasonal random walk” model (1 — B 4 )S, = 0 Q + a lt , and a lt and 
a 2t are independent white noise processes with variances and <7~ , respectively. 
Show that z t follows the seasonal ARIMA model (1 — B 4 )z, = 0 O + (1 — ®B 4 )a t , 
and determine expressions for © and <7“ in terms of the variance parameters of the 
other two processes. Discuss the implication if the resulting value of © is equal (or 
very close) to one, with regard to deterministic seasonal components. 

9.12. Monthly averages of hourly ozone readings in downtown Los Angeles for the period 
from January 1955 to December 1972 are included as Series R in Part 5 of this book; 
see also http://pages.stat.wisc.edu/reinsel/bjr-data/. 

(a) Plot the time series and comment. 

(b) Develop a suitable time model for this time series. Discuss the adequacy of the 
selected model. 


10 


ADDITIONAL TOPICS AND EXTENSIONS 


In previous chapters, the properties of linear autoregressive-moving average models have 
been examined extensively and it has been shown how these models can be used to 
represent stationary and nonstationary time series that arise in practice. This chapter will 
discuss additional topics that either supplement or extend the material presented in earlier 
chapters. We begin by discussing unit root tests that can be used as a supplementary tool 
to determine whether a time series is unit root nonstationary and can be transformed to a 
stationary series through differencing. This topic is discussed in Section 10.1. Unit root 
testing has received considerable attention in the econometrics literature, in particular, 
since it appears to be a common starting point for applied research in macroeconomics. For 
example, unit root tests are an integral part of the methodology used to detect long-term 
equilibrium relationships among nonstationary economic time series, commonly referred 
to as cointegration. In Section 10.2, we consider models for conditional heteroscedastic 
time series, which exhibit periods of differing degrees of volatility or variability depending 
on the past history of the series. Such behavior is common in many economic and financial 
time series, in particular. In Section 10.3, we introduce several classes of nonlinear time 
series models, which are capable of capturing some distinctive features in the behavior of 
processes that deviate from linear Gaussian time series. Finally, Section 10.4 looks at models 
for long memory processes, which are characterized by the much slower convergence to 
zero of their autocorrelation function p k as k oo compared with the dependence structure 
of ARMA processes. 
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10.1 TESTS FOR UNIT ROOTS IN ARIMA MODELS 

As discussed in earlier chapters, the initial decision concerning the need for differencing 
is based, informally, on characteristics of the time series plot of z, and of its sample 
autocorrelation function. In particular, a failure of the autocorrelations r k to dampen out 
sufficiently quickly would indicate that the time series is nonstationary and needs to 
be differenced. This can be evaluated further using formal tests for unit roots in the 
autoregressive operator of the model. Testing for unit roots has received considerable 
attention in the time series literature motivated by econometric applications, in particular. 
Early contributions to this area include work by Dickey and Fuller (1979, 1981). These 
authors proposed tests based on the conditional least-squares estimator for an autoregressive 
process and the corresponding “t-statistic.” While the underlying concepts are fairly 
straightforward, a number of challenges arise in practice. In particular, the distribution 
theory for parameter estimates and associated test statistics developed for stationary time 
series do not apply when a unit root is present in the model. The asymptotic distributions 
are functions of standard Brownian motions and do not have convenient closed-form 
expressions. As a result, the percentiles of the distributions needed to perform the tests 
have to be evaluated using numerical approximations or by simulation. Moreover, the form 
of the test statistics and their asymptotic distributions are impacted by the presence of 
deterministic terms such as constants or time trends in the model. The size and power 
characteristics of unit root tests can also be a concern for shorter time series. This section 
provides a brief description of the tests proposed by Dickey and Fuller and summarizes 
some of the subsequent developments. For a more detailed discussion of unit root testing, 
see, for example, Hamilton (1994) and Fuller (1996). Reviews of unit root tests and their 
applications are provided by Dickey et al. (1986), Pantula et al. (1994), Phillips and Xiao 
(1998), and Haldrup et al. (2013), among others. 


10.1.1 Tests for Unit Roots in AR Models 


Simple AR(1) Model. To introduce unit root testing, we first examine the simple AR(1) 
model z t = <pz. t _ j + a t ,t = 1,2 ,.... n, with z 0 = 0 and no constant term. We are interested 
in testing the hypothesis that <p = 1 so that the series follows a random walk. The conditional 
least-squares (CLS) estimator of <fi is given by 


i= L ~ri Z ‘ = V" ; 




Y" z 2 

A=2 Vi 


Y" z 2 

A=2 Vi 


In the stationary case with \4>\ < 1, the statistic m'/ 2 (</> — <p) has an approximate normal 
distribution with zero mean and variance (1 — </> 2 ). However, when </> = 1, so that z t = 
S/=o a t-j + z o i n integrated form, it can be shown that 


n(4> - 1) = 


— 1 V 1 ^ 

n Lt=2 Z t-\ a t 

„-2 Y” 7 2 

A=2 Vi 


= 0 ,( 1 ) 


bounded in probability as n oo, with both the numerator and denominator possessing 
nondegenerate and nonnormal limiting distributions. Hence, in the nonstationary case the 
estimator </> approaches its true value 0=1 with increasing sample size n at a faster rate 
than in the stationary case. 



354 ADDITIONAL TOPICS AND EXTENSIONS 


The limiting distribution of n(<p — 1) was studied by Dickey and Fuller (1979) who 
showed that under the null hypothesis </> = 1 

V T (A 2 - 1) 

n(<P — i)—► -——— (10.1.1) 

where(T, A) = (^ yfzf, 2 1 / 2 y,.Z i ), withy,. = 2(-l)' +1 /[(2i - 1 )tt], and the Z, are 
iid N( 0, 1) distributed random variables. An equivalent representation for the distribution 
is given by 


v [' B(u)dB(u ) 

«(0- 1)—► - 

f 0 B(u) 2 du 

_ \{B{ 1) 2 -1) 
fl B{u) 2 du 


( 10 . 1 . 2 ) 


where B(u ) is a (continuous-parameter) standard Brownian motion process on [0, 1]; see 
Chan and Wei (1988). Such a process is characterized by the properties that B( 0) = 0, in¬ 
crements over nonoverlapping intervals are independent, and B(u + s) — B(s) is distributed 
as normal N( 0, u). Basically, B(u) is the limit as n -» oo of the process 


- 1/2 - 1/2 1""1 

n ' _ _ n ' v 1 

— z [m,] ~ — Q < 

a a t=\ 


where [nu] denotes the largest integer part of nu, 0 < u < 1. 

By the functional central limit theorem (Billingsley, 1999; Hall and Heyde, 1980, 
Section 4.2), nT^l 2 Z\ nu \la a converges in law as n -» oo to the standard Brownian motion 
process { B(u), 0 < u < 1}. The random walk model z t = " r _, + a, with z 0 = 0 implies that 
z,_i a t = \{z 2 - z 2 _ x - a 2 ), so that 


n 

n~ X Yj z r-\ a t 

t=2 


n- l z 2 -n~ i y a 2 

n / > t 


1=1 


V o- , 

— -f[£(l) 2 - 1] 


(10.1.3) 


since n l z 2 n = u 2 (n l ! 2 z n /i r a ) 2 —> u 2 B(l) 2 while n 1 J]” =1 — > by the law of large 

numbers. In addition, 


n 


n 


-2 




(10.1.4) 


by the continuous mapping theorem (Billingsley, 1999; Hall and Heyde, 1980, p. 276). 
Hence, these last two results establish the representation (10.1.2). 

The limiting distribution of «(</> — 1) described above does not have a closed-form rep¬ 
resentation but it can be evaluated numerically using simulation. Tables for the percentiles 
of the limiting distribution are given by Fuller (1996, Appendix 10. A). Fuller also provides 
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tables for the limiting distribution of the “Studentized” statistic 


0-1 


*.( KU £ 1 >- 1/2 


(10.1.5) 


where s 2 = (n — 2) _1 ( Y " = 2 z 2 — 4> z /-i z r) i s the residual mean square. These results 
can be used to test the random walk hypothesis that tfi = 1. Since the alternative hypothesis 
of stationarity is one-sided, the test rejects <fi = 1 when r is sufficiently negative. The test 
based on r is commonly referred to as the Dickey-Fuller (DF) test in the literature. 


Higher Order AR Models. To extend the results to higher order models, we consider a gen¬ 
eralized AR(p + 1) process z t = Vi z t-i + a o or <p(B)z t = a n where q>(B) contains 
a single unit root so that cp(B) = <p(B)( 1 — B ) and <f>(B) = 1 — Y p _, is a stationary 
AR operator of order p. Hence, 

p 

<p(B)z t = 4>(B)( 1 - B)z, =z t - z t _i - ^ 4>j(z t -j ~ z t-j-\) + a t 

]= i 

Testing for a unit root in cp(B) is then equivalent to testing p = 1 in the model 

p 

z, = pz t _ x + ^ 4>j(z t _j - z.f — j — i) + a, 

7—1 

or equivalently testing p — 1 = 0 in the model 

p 

(z t ~ z ,_|) = (P ~ l)z,_i + ^ 0 Mt-j ~ z t-j-0 + a t 

7=1 

In fact, for any generalized AR (p + 1) model z t = < Pj z t-j + a n it is seen that the 
model can be written in an equivalent form as 

p 

w, = {p- 1 )z t _ 1 + ^ <t>jW t _j + a t (10.1.6) 

7=1 

where w, = z t — z t _ lt p — 1 = —cp( 1) = Yy+J <Pj — 1, and </> ; = Vi ~ 1- Hence, the 

existence of a unit root in the AR operator q>(B) is equivalent to p = Y p +! (p j = 1. 

Based on this last form of the model, let (p — 1,0,,... ,<p p ) denote the usual condi¬ 
tional least-squares estimates of the parameters in (10.1.6) obtained by regressing w t on 
z t _i, w t _i ,..., w t _ p . Then, under the unit root model where p = 1 and </>(£) is stationary, 
it follows from Fuller (1996, Theorem 10.1.2 and Corollary 10.1.2.1) that 


(p- D/S 


- 1/2 


m 2 A i 

\t=p+2 , 
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has the same limiting distribution as the Studentized statistic f in (10.1.5) for the AR(1) 
model, while (n — p — 1 )(p — l)c, where c = yq with i p{B) = has approxi¬ 

mately the same distribution as the statistic «(</> — 1) for the AR(1) model. Also, it follows 
that the statistic, denoted as r, formed by dividing (p — 1) by its estimated standard er¬ 
ror from the least-squares regression will be asymptotically equivalent to the statistic 
(£-!)/{* a( 2”=/>+2 ^ 2 }’ anc * hence will have the same limiting distribution as the 

statistic t for the AR(1) case; see Said and Dickey (1984). 

The test statistic ? formed from the regression of w t on z r _ t , w t _ j,..., w t _ p as described 
above can thus be used to test for a unit root in the AR(p + 1) model cp(B)z t = a t . This 
is the well-known augmented Dickey-Fuller (ADF) test. Furthermore, as shown by Fuller 
(1996, Theorem 10.1.2), the limiting distribution of the least-squares estimates (</q,..., ([> p ) 
for the parameters of the stationary operator </>( B ) in the model is the same as the standard 
asymptotic distribution for least-squares estimates obtained by regressing the stationary 
differenced series w t on w t _,,..., w t _ p . The estimation results for the stationary AR model 
discussed earlier in Section 7.2.6 are therefore valid in this case. 


Inclusion of a Constant Term. The results described above extend with suitable modifi¬ 
cations to the more practical case where a constant term () u is included in the least-squares 
regression. Under stationarity, the constant is related to the mean of the process and equals 
9 0 = (1 — <pj — • • • — cp p+l )p = (1 — p)p. The least-squares regression yields atest statistic 
analogous to ? above denoted by f , although the limiting distribution of this test statistic 
is derived under the assumption that 0 {) = 0 under the null hypothesis <fi = 1. For example, 
for the AR(1) model z, = 4>z t _ l -F 0 O + a, with 0 0 = (1 — 4>)p. the least-squares estimator 
for </> is 



S"=2^ Z I-1 - z (l)H z f - z (0)) 

TU^t -1 - z d)) 2 


(10.1.7) 


where z (/) = (n - 1) 1 £" =2 z,_,-, / = 0, 1, so that = </> + Z" = 2 ^ z r-l _ z (l)) a r/ 
( z t -1 — z (i)) 2 - When 0=1, the representation for the limiting distribution of 
n(4> p — 1) analogous to (10.1.2) is given by 

B(u)dB(u)-£B( 1) 

—---- ( 10 . 1 . 8 ) 

/o B(u) 2 du — 

f 1 

where £ = J 0 B(u)da, and it is assumed that 0 O = (1 — <p)p = Q when <p = 1. The corre¬ 
sponding Studentized test statistic for 0=1 in the AR( 1) case is 


«(</>;, - 1) 


® /o 


4>u - ' 


S fl [E"= 2 ( z r-l 


rii) 


n - 1 / 2 


(10.1.9) 


The limiting distribution of f /( readily follows from the result in (10.1.8). Tables of per¬ 
centiles of the distribution of i fl when 0=1 are provided by Fuller (1996, p. 642). Note 

that under 0=1, since z t = ]C ; =o a i-j + in the truncated random shock or integrated 
form, the terms z , — z (0) and z r _j — z (1) do not involve the initial value z 0 . Therefore, the 
distribution theory for the least-squares estimator does not depend on any assumption 
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concerning z 0 . Also, the results for the first-order AR( 1) model with a constant term extend 
to higher order autoregressive models in much the same way as it does when the constant 
term 9q is absent from the model. The tables developed for the percentiles of the limiting 
distribution of statistic f /( can thus be used for higher order AR models as well. 

The procedures described above are based on conditional LS estimation or equivalently 
on the conditional likelihood assuming that the noise term a, follows a normal distribution. 
Pantula et al. (1994) studied unconditional likelihood estimation for the AR model with a 
unit root. They showed that the limiting distributions of estimators and test statistics for unit 
root based on the unconditional likelihood are different from those based on the conditional 
approach. For example, in the simple AR(1) model z, = (pz t _, + a, with no constant term 
included in the estimation, the unconditional log-likelihood is 


/(&<# = “§ ln(<72) + Iln(l — 0 2 ) 


1 

2(7 2 

a 


- + (i - </> 2 )z 2 

t=2 


as shown in Appendix A7.4. The unconditional ML estimator </;, which maximizes /(</>, c 2 ), 
is a root of the cubic equation in <p given by (A7.4.20). Pantula et al. (1994) derived the 
asymptotic distribution of n(<p ] — 1) and concluded, using Monte Carlo studies, that tests 
for unit root in AR models based on the unconditional maximum likelihood estimator 
are more powerful than those based on the conditional maximum likelihood estimator for 
moderate values of 


Processes with Deterministic Linear Trend. The asymptotic distribution theory related 
to the least-squares estimator (p fl in (10.1.7) depends heavily on the condition that the 
constant term 0 () is zero under the null hypothesis 0=1, since the behavior of the process 
z, = z t _i + 0 Q + a t differs fundamentally between the cases 9q = 0 and 9 0 # 0. When 
9 0 = 0, the process is a random walk with zero drift. When 0 {] # 0, the model can be written 
as z, = 9 0 t + z 0 + u,. where u t = u t _\ + a t . The process {z r } is now a random walk with 
drift and its long-term behavior in many respects is dominated by the deterministic linear 
trend term 9 0 t contained in z t . If 0 Q has a nonzero value under the hypothesis <p = 1, then 
n 3 / 2 (</> /( — 1) converges in distribution to N( 0,12cr 2 /0 2 ) as n -*■ oo. Thus, when 9 0 # 0 the 
asymptotic normal distribution theory applies to the least-squares estimator (p fl and to the 
corresponding test statistic f /( . For details, see Fuller (1996, Section 10.1.2) and Hamilton 
(1994, Section 17.4). 

For a time series that exhibits a persistent trend, it is often of interest to determine whether 
the trend arises from the drift term of a random walk or it is due to a deterministic trend added 
to a stationary AR(1) model, for example. The previous formulation of the AR(1) model 
with nonzero constant z, = (pz t _i + 9 Q + a t does not allow this, since when | <p |< 1 this 
model implies a process with constant mean /./ = E[z t ] = 9 0 /( 1 — </>), independent of time. 
An alternate formulation of the AR(1) model that allows for a deterministic linear time 
trend that is not linked to (p is 


z, = a + 9 0 t + u t where u t = (pu t _\ + a t t=l,...,n (10.1.10) 

This model has a linear trend with slope 9 Q # 0 regardless of whether cp = 1 or </> / 1. It 
is of interest to note the relation between parameters in this form relative to the previous 
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form. Applying the operator (1 — <j)B) to (10.1.10), the model can be expressed as 

z, = </>£,_] + a 0 + <V + a t (10.1.11) 

where a Q = a( 1 — 0) + 00 o and <5 0 = 9 0 ( 1 — 0). Hence, in this form a 0 = 9 Q and <5 0 = 0 
are obtained under 0 = 1, so that z t = z t _ 1 + 0 Q + a r The presence of the linear time trend 
in (10.1.10) thus leads to a model with a nonzero constant but a zero coefficient for the 
time trend under the null hypothesis 0=1. The constant 0 {) is referred to as a drift term 
and measures the expected change in the series when the time increases by one unit. 

A common procedure to test for a unit root in this model is to perform least-squares 
estimation with the linear trend term t in addition to the constant included in the regression. 
The resulting estimator of 0, denoted as 0 r , is such that the limiting distribution of n(0 r — 
1), under 0=1, does not depend on the value of the constant a 0 = 0 {) but still requires 
the coefficient <5 0 of the time variable t to be zero under the null hypothesis. Hence, this 
estimator 0 r can be used as the basis of a valid test of 0 = 1 regardless of the value 
of the constant 9 0 . Tables of percentiles of the null distribution of n(0 T — 1) and of the 
corresponding Studentized statistic r T are available in Fuller (1996, p. 642). 

Alternative procedures to test 0 = 1 in the presence of a possible deterministic linear 
trend, which are valid regardless of the value of the constant term, have been proposed by 
several authors. Bhargava (1986) developed a locally most powerful invariant test for unit 
roots. Schmidt and Phillips (1992) used a score (or Lagrange multiplier (LM)) test for the 
model (10.1.10), and Ahn (1993) extended this approach to allow for a more general ARMA 
model for the noise process u t . Elliott et al. (1996) used a point optimal testing approach 
with maximum power against a local alternative for the same model. The power gains were 
obtained by a preliminary generalized least-squares (GLS) detrending procedure using a 
local alternative to 0 = 1, followed by use of the least-squares estimate 0 and corresponding 
test statistic f obtained from the detrended series. Subsequent contributions to this area 
include work by Ng and Perron (2001), Perron and Qu (2007), and Harvey et al. (2009), 
among others. 

10.1.2 Extensions of Unit Root Testing to Mixed ARIMA Models 

The test procedures described above and other similar ones have been extended to testing 
for unit roots in mixed ARIMA(p, 1, q) models (e.g., see Said and Dickey (1984, 1985) 
and Solo (1984b)), as well as models with higher order differencing (e.g., see Dickey 
and Pantula (1987)). Said and Dickey (1984) showed that the Dickey-Fuller procedure, 
which was originally developed for autoregressive models of known order p , remains valid 
asymptotically for an ARIMA(p, 1 ,q) model where p and q are unknown. The authors 
approximated the mixed model by an autoregressive model of sufficiently high order and 
applied the ADF test to the resulting AR model. The approximation assumes that the 
lag length of the autoregression increases with the length of the series, n, at a controlled 
rate less than n 1 / 3 . Phillips (1987) and Phillips and Perron (1988) proposed a number of 
unit root tests that have become popular in the econometrics literature. These tests differ 
from the ADF tests in how they deal with serial correlation and heteroscedasticity in the 
error process. Thus, while the ADF tests approximate the ARMA structure by a high- 
order autoregression, the Phillips and Perron tests deal with serial correlation by directly 
modifying the test statistics to account for serial correlation. Likelihood ratio type of unit 
root tests have also been considered for the mixed ARIMA model based on both conditional 
and unconditional normal distribution likelihoods by Yap and Reinsel (1995) and Shin and 
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Fuller (1998), among others. Simulation studies suggest that these tests often perform better 
than f-type test statistics for mixed ARIMA models. 

Motivated by problems in macroeconomics and related fields, the literature has continued 
to grow and many other extensions have been developed. These include the use of bootstrap 
methods for statistical inference as discussed, for example, by Palm et al. (2008). The use 
of Bayesian methods for unit root models has also been considered. The problem of 
distinguishing unit root nonstationary series from series with structural breaks such as level 
shifts or trend changes has been considered by many researchers. The methodology has 
also been extended and modified to deal with more complex series involving nonlinearities, 
time-varying volatility, and fractionally integrated processes with long-range dependence. 
Tests with a null hypothesis of stationarity, rather than unit root nonstationarity, have also 
been proposed in the literature. For further discussion and references, see, for example, 
Phillips and Xiao (1998) and Haldrup et al. (2013). 

Example: Series C. To illustrate unit root testing, consider the series of temperature 
readings referred to as Series C. Two potential models identified for this series in Chapter 6 
were the ARIMA( 1,1,0) and the ARIMA(0,2,0). Since there is some doubt about the need 
for the second differencing in the ARIMA(0, 2, 0) model, with the alternative model being 
a stationary AR(1) for the first differences, we investigate this more formally. The AR(1) 
modelVz, = <pVz t _ l + a t for the first differences can be written as V 2 z, = (</> — l)Vz r _i 4 - 
a t , and in this form the conditional least-squares regression estimate <p — 1 = —0.187 is 
obtained, with an estimated standard error of 0.038, and = 0.018. Note that this implies 
4> = 0.813 similar to results in Tables 6.5 and 7.6. The Studentized statistic to test (/> = 1 
is f = —4.87, which is far more negative than the lower one percentage point of —2.58 for 
the distribution of f in the tables of Fuller (1996). Also, r = —4.96 was obtained when a 
constant term is included in the AR(1) model for Vz f . Hence, these estimation results do 
not support the need for second differencing and point to a preference for the ARIMA(1, 
1, 0) model. 

Implementation in R. Tests for unit roots can be performed using the package fUnitRoots 
available in the FinTS package in R. If Z represents the time series of interest, the command 
used to perform the augmented Dickey-Fuller test is 

> adfTest(z,lags,type=c("nc","c","ct") 

where lags denotes the number of lags in the autoregressive model and type indicates 
whether or not a constant or trend should be included in the fitted model. The argument 
“nc” specifies that no constant should be included in the model, “c” is used for con¬ 
stant only, and “ct” specifies a trend plus a constant. For lags equal to 0, the test is the 
original Dickey-Fuller test. Otherwise, lags represents the order of the stationary autore¬ 
gressive polynomial in (10.1.6). For a mixed ARMA model, it represents the order of the 
autoregressive approximation to this model. 

The calculations for Series C described above can be performed in R as follows: 

> library(fUnitRoots) 

> adfTest(diff(ts(seriesC)),0,type=c("nc")) 

Title: Augmented Dickey-Fuller Test 

Test Results: 
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PARAMETER: 

Lag Order: 0 

STATISTIC: Dickey-Fuller: -4.8655 

P VALUE: 0.01 

> adfTest(diff(ts(seriesC)),0,type=c("c")) 

Title: Augmented Dickey-Fuller Test 
Test Results: 

PARAMETER: 

Lag Order: 0 

STATISTIC: Dickey-Fuller: -4.962 

P VALUE: 0.01 

The values of the test statistics agree in both cases with those quoted in the example. Note 
that the output shows the p value but does not give the critical value for the test. If the 
critical values are needed, they can be obtained in R using the command 

>adfTable(trend=c("nc", "c" ,"ct"), statistic=c("nc","c","ct")) 

Example: Series A. For further illustration, consider Series A that represents concentra¬ 
tion readings of a chemical process at 2-hour intervals and has n = 197 observations. In 
Chapters 6 and 7, two possible ARMA/ARIMA models were proposed for this series. 
One is the nearly nonstationary ARMA(1, 1) model, (1 — 4>B)z t = 0 O + (1 — 6B)a r with 
estimates </> = 0.92, 0 = 0.58, 0 Q = 1.45, and & 2 a = 0.0974. The second is the nonstationary 
ARIMA(0,1,1) model, (1 — B)z t = (1 — 9B)a t , with estimates 6 = 0.71 and a 2 = 0.1004. 
Below we use the ADF test to test the hypothesis that differencing is needed so that the series 
follows the ARIMA(0, 1,1) model. To determine the order k of the autoregressive approx¬ 
imation to this model, we first use the R command ar(z) to select a suitable value for k 
based on the AIC criterion. The output suggests an AR(6) model, which is then used for 
the test. A slightly different choice of k does not alter the conclusion. 


> library(fUnitRoots) 

> ar(diff(ts(seriesA)),aic=TRUE) 

Call: ar (x = diff(ts(seriesA)), aic = TRUE) 
Coefficients : 

1 2 3 4 5 6 

-0.6098 -0.3984 -0.3585 -0.3175 -0.3142 -0.2139 
Order selected 6 sigma~2 estimated as 0.09941 

> adfTest(ts(seriesA),6,type=c("nc")) 

Title: Augmented Dickey-Fuller Test 
Test Results: 

PARAMETER: 

Lag Order: 6 

STATISTIC: Dickey-Fuller: 0.6271 

P VALUE: 0.8151 
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The p values are large and the test does not reject the null hypothesis that the series 
needs to be differenced, suggesting that AR1MA(0, 1, 1) is the preferred model. A similar 
conclusion was reached by Solo (1984b) who used a Lagrange multiplier test to determine 
the need for differencing. 


10.2 CONDITIONAL HETEROSCEDASTIC MODELS 

This section presents an overview of some models that have been developed to describe 
time-varying variability or volatility in a time series. To first introduce some notation, we 
note that the ARM At/), q ) process 4>{B)z t = 9 0 + 6(B)a t can be written as the sum of a 
predictable part and a prediction error as 


z, = E[z,\F t _ l ] +a, 

where F t _ ] represents the past information available at time t — 1 and a t represents the 
prediction error. For the ARMA model, F t _ { is a function of past observations and past error 
terms, but could more generally include external regression variables X t . The assumption 
made thus far is that the prediction errors a t are independent random variables with a 
constant variance Var[a r ] = t7~ that is independent of the past. However, this assumption 
appears inconsistent with the heteroscedasticity often seen for time series in business and 
economics, in particular. For example, financial time series such as stock returns often 
exhibit periods when the volatility is high and periods when it is lower. This characteristic 
feature, or stylized fact, is commonly referred to as volatility clustering. For illustration. 
Figure 10.1(a) shows the weekly S&P 500 Index over the period January 3, 2000 to 
May 27, 2014 for a total of 751 observations. The log returns calculated as ln(p f /p f _j) = 
ln(p f ) — ln(p r _|), where p t represents the original time series, are shown in Figure 10.1(b). 
We note that while the original time series is nonstationary, the returns fluctuate around a 
stable mean level. However, the variability around the mean changes and volatility clusters 
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FIGURE 10.1 (a) Time plot of the weekly S&P 500 Index from January 3, 2000 to May 27, 2014, 

and (b) the weekly log returns on the S&P 500 Index. 
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are clearly visible. Note the high volatility during and following the 2008 financial crisis, in 
particular. Another common feature of financial time series is that the marginal distributions 
are leptokurtic and tend to have heavier tails than those of a normal distribution. A number of 
other stylized facts have been documented and investigated for financial data (for discussion 
and references, see, for example, Terasvirta et al., 2010, Chapter 8). 

The autoregressive conditional heteroscedastic (ARCH) model was introduced by Engle 
(1982) to describe time-varying variability in a series of inflation rates. An extension of this 
model called the generalized conditional heteroscedastic (GARCH) model was proposed by 
Bollerslev (1986). These models are capable of describing not only volatility clustering but 
also features such as heavy-tailed behavior that is common in many economic and financial 
time series. Still, there are other features related to volatility that are not captured by the 
basic ARCH and GARCH models. This has led to a number of extensions and alternative 
formulations aimed at addressing these issues. This section presents a brief description of 
the ARCH and GARCH models along with some extensions proposed in the literature. 
The literature in this area is extensive and only a select number of developments will be 
discussed. A more complete coverage can be found in survey papers by Bollerslev et al. 
(1992, 1994), Bera and Higgins (1993), Li et al. (2003), and Terasvirta (2009), among 
others. Volatility modeling is also discussed in several time series texts, including Franses 
and van Dijk (2000), Mills and Markellos (2008), Terasvirta et al. (2010), and Tsay (2010). 
Textbooks devoted to volatility modeling include Francq and Zakoian (2010) and Xekalaki 
and Degiannakis (2010). 

10.2.1 The ARCH Model 

For a stationary ARMA process, the unconditional mean of the series is constant over time 
while the conditional mean E[z t \F t _ x \ varies as a function of past observations. Parallel 
to this, the ARCH model assumes that the unconditional variance of the error process is 
constant over time but allows the conditional variance of a t to vary as a function of past 
squared errors. Letting of = var [a t \ F t _ | ] denote the conditional variance of a t , given the 
past F t _i, the basic ARCH(.v) model can be formulated as 


a, = o t e t (10.2.1) 

where {e t } is a sequence of iid random variables with mean zero and variance 1, and 

of = a 0 + oqtf_j + • • • + a s a 2 t _ s (10.2.2) 

with a 0 > 0, a ; > 0, for i = 1,..., s — 1, and a s > 0. The parameter constraints are 
imposed to ensure that the conditional variance of is positive. The additional constraint 
2f'=i a i < 1 ensures that the a, are covariance stationary with finite unconditional variance 
a 2 . For some time series, such as stock returns, the original observations are typically 
serially uncorrelated and the a t are observed directly. Alternatively, the a t can be the noise 
sequence associated with an ARMA or regression-type model. For modeling purposes, the 
e t in (10.2.1) are usually assumed to follow a standard normal or a Student /-distribution. 

The ARCH model was used by Engle (1982) to study the variance of UK inflation rates 
and by Engle (1983) to describe the variance of U.S. inflation rates. The ARCH model 
and its later extensions by Bollerslev (1986) and others quickly found other applications. 
For example, Diebold and Nerlove (1989) showed that the ARCH model may be used to 
generate statistically and economically meaningful measures of exchange rate volatility. 
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Bollerslev (1987) used the GARCH extension of the ARCH model to analyze the condi¬ 
tional volatility of financial returns observed at a monthly or higher frequency. In Weiss 
(1984), ARMA models with ARCH errors were used to model the time series behavior of 13 
different U.S. macroeconomic time series. Bollerslev et al. (1992) describe a large number 
of other applications in their review of volatility models. While a majority of applications 
have been in finance and economics, the models have also been used in other fields. For 
example, Campbell and Diebold (2005) used volatility models in their analysis of the daily 
average temperatures for four U.S. cities. The models have also been used for variables 
such as wind speeds, air quality measurements, earthquake series, and in the analysis of 
speech signals. For selected references, see Francq and Zekoian (2010, p. 12). 

Some Properties of the ARCH Model. To establish some properties of the ARCH model, 
we first examine the ARCH(l) model where 

<y; = var [a t \F t _ x ] = E[a 2 | Fj^] = a Q + a x a 2 t _ x (10.2.3) 

with a 0 > 0 and a 1 > 0. The form of the model shows that the conditional variance of will 
be large if a r _j was large in absolute value and vice versa. A large (small) value of of will 
in turn tend to generate a large (small) value of a t , thus giving rise to volatility clustering. 

It follows from (10.2.1) that E [a t \ F t _ j] = 0. The unconditional mean of a t is also zero 
since 


E[a t ] = E[E[a t \F t _ 1 ]}=0 

Furthermore, the a, are serially uncorrelated since for j > 0, 

E[a t a t _j] = E [E[a t a t _j | F,^]] = E [a t _jE[a, \ F,^]] = 0 

But the a, are not mutually independent since they are interrelated through their conditional 
variances. The lack of serial correlation is an important property that makes the ARCH 
model suitable for modeling asset returns that are expected to be uncorrelated by the 
efficient market hypothesis. 

We also assume that the a t have equal unconditional variances, var[aj = E[a 2 ] = of, 
for all t, so that the process is weakly stationary. If aj < 1, the unconditional variance exists 
and equals 


o- 2 = var [a,] = — — (10.2.4) 

*1 - a l 

This follows since 

°a = = E [ E I F r-l]] = = «o + 

Further substituting a Q = o- 2 (l — aq) from (10.2.4) into (10.2.3), we see that 

of = o 2 + o’, (a 2 _ 1 - of) (10.2.5) 

or, equivalently, of — of = U\(cr_ x — of). Hence, the conditional variance of a, will 
be above the unconditional variance whenever a~_ l is larger than the unconditional 
variance a 1 . 

a 
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To study the tail behavior of a t , we examine the fourth moment // 4 = E[a A ], If a, is 
normally distributed, conditional on the past, then 

E[a t | F r _,] = 3(7 f 4 = 3(a 0 + aqa^) 2 


Therefore, the fourth unconditional moment of a, satisfies 

E[a A ] = E [E[a 4 | F,^]] =3 [a~+ 2a 0 a 1 £[a^_ 1 ] + a\E[a A _$ 
Thus, if {a,} is fourth-order stationary so that /,< 4 = E[a A ] = £[a 4 _ ], then 

3 « q (1 - a 2 ) 


ft 4 = 


3(«q + 2a 0 a 1 ( 7 ^) 
1 — 3a“ 


(1 - ai ) 2 (l-3af) 


( 10 . 2 . 6 ) 


Since // 4 = Z: [a 4 ] > 0, this expression shows that a t must satisfy 0 < a 1 < 1/ yfh in order 
for a t to have finite fourth moment. Further, if k denotes the unconditional kurtosis of a t , 
then 

E[a A ] _ 3(1 - a 2 ) 

[Etf]) 2 1-3«J 

This value exceeds 3, the kurtosis of the normal distribution. Hence, the marginal distri¬ 
bution of a, has heavier tails than those of the normal distribution. This is an additional 
feature of the ARCH model that makes it useful for modeling financial asset returns where 
heavy-tailed behavior is the norm. 

To derive an alternative form of the ARCH process, we let v t = a 2 — aj, so that cr = 
a 2 + v,. The random variables v t then have zero mean and they are serially uncorrelated 
since 


E[(a 2 - a 2 )(.a 2 _j - a 2 ^)] = E[E{(a 2 - o 2 )(a 2 _j - o 2 ^) \ F ,_ x }] 

=E[(a 2 _j - alj)E{(a 2 - a 2 ) \ F t _ x }] = 0 

Further, since a 2 = a Q + a\cr_ r we find that the ARCH(l) model can be written as 

a 2 = a {) + a ] a 2 _ [ + u t ( 10 . 2 . 7 ) 

This form reveals that the process of squared errors a 2 can be viewed as an AR( 1) model with 
uncorrelated innovations v r The innovations are heteroscedastic and also non-Gaussian in 
this case, however. 

For the ARCH(.v) model in ( 10 . 2 . 2 ), we similarly have 

= (Xq + 0C\Q 2 _i -(-••■ + I 

so that the a 2 has the form of an AR(.v) process. Other results related to the moments and the 
kurtosis of the ARCH(l) model also extend to higher order ARCH models. In particular, if 
o', < 1, then the unconditional variance is 


i - i; = i «, 
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as shown by Engle (1982). Necessary and sufficient conditions for the existence of higher 
order even moments of the ARGH(s) process were given by Milhpj (1985). 

Forecast Errors for the ARCH Model. Forecasts of a future value z t+! generated from 
ARMA models with iid errors a, have forecast errors that depend on the lead time l 
but are independent of the time origin t from which the forecasts are made. Baillie and 
Bollerslev (1992) showed that the minimum mean square error forecasts of z 1+t are the 
same irrespective of whether the shocks a, are heteroscedastic or not. For an ARMA process 
with ARCH errors, this implies, in particular, that the one-step-ahead forecast error equals 
a t+ j while the /-step-ahead forecast error can be written as e t (l ) = Vj a t+l-j with i/s Q 
= 1. The presence of conditional heteroscedasticity will, however, impact the variance of 
the forecast errors. 

For an ARCH(1) process, the conditional variance of the one-step-ahead forecast error 
a t+i i s gi ven by (10.2.5) as 

E[e 2 (l)\F,\ = <7 f 2 +1 = o] + «!(a 2 - a]) (10.2.8) 

The conditional variance of the one-step-ahead forecast error can thus be smaller or larger 
than the unconditional variance depending on the difference between the last squared error 
a 2 and a 2 . 

t a 

Conditional variances of multistep-ahead forecast errors e t (l) can also be shown to 
depend on the past squared errors based on 

/-l 

E[e 1 t {l)\F,] = Y J v]E[a 2 t+l _ j \ F t ] 
j =o 

where for the ARCH( 1) model 

E[a 2 t+h \F,\ = E[E(a 2 +h \ F t )] 

= a 0 + « l E[a 2 +h _ l | F t ] 

= a Q (l + a x + ••• + aj !_1 ) + a^a 2 for h > 0 
From this and using (10.2.4) it can be verified that 

E[e 2 (l) | F,\ = a] £ W 2 + £ yrja[- j (a 2 - a 2 ) (10.2.9) 

j=0 7=0 

which simplifies to (10.2.8), for / = 1. The first term on the right-hand side of this expression 
is the conventional prediction error variance assuming that the errors a t are homoscedastic 
while the second term reflects the impact of the ARCH effects. This term varies over time 
and can again be positive or negative depending on the difference a 2 — a 2 . The variance 
of the predicted values thus varies over time and can be larger or smaller than that under 
homoscedasticity. For the general ARCH(.y) model, the second term on the right-hand side 
will be a function of s past values a 2 ,..., o 2 _ J+1 ■ 

If the time series z, follows an AR(1) model, the i// weights are given by i pj = 

If (j> equals zero, so that the mean of the series is a constant independent of the past, 
expression (10.2.9) simplifies to a 2 + a^(a 2 — a 2 ). We note that this is the conditional 
/-step-ahead forecast of the conditional variance cr 2 +l for the ARCH( 1) model. This forecast 
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could be calculated more directly as E[cr 2 +l \F t ] = + aiE[a 2 +ll \F t ], where E[a 2 +I _ [ \ F t \ 

can be generated recursively from the AR model for a 2 . The result follows by setting 
«o = - «i)- 

10.2.2 The GARCH Model 

The ARCH model has a disadvantage in that it often requires a high lag order s to adequately 
describe the evolution of volatility over time. An extension of the ARCH model called the 
generalized ARCH, or GARCH, model was introduced by Bollerslev (1986) to overcome 
this issue. The GARCHG, r) model assumes that a t = o t e t , where the [e t ] again are iid 
random variables with mean zero and variance 1, and where o t is given by 

s r 

°f = «0 + Yj a rf-i + Z Pj a M (10.2.10) 

/= l 1=1 

with a 0 > 0, ai > 0, i = 1,..., s — 1, a s > 0, pj > 0, j = 1,..., r — 1, and p r > 0. These 
parameter constraints are sufficient for the conditional variance aj to be positive. Nelson 
and Cao (1992) showed that these constraints can be relaxed slightly to allow some of 
the parameters to be negative while the conditional variance still remains positive. The 
additional constraint ^'” =| (a,- + /?,) < 1, where m = max(s,r) with a,- = 0, for i > s, and 
B: =0, for j > r, ensures that the unconditional variance a 2 is finite. 

The simplest and most widely used model in this class is the GARCH(1,1) model where 

o] = E[a 2 | F r _,] = « 0 + a^a 2 t _, + /?,cr^. 

Since the constants aq and Bi are positive, we see that a large value of a 2 l or u 2 _ { results in 
a large value of o 2 . As for the ARCH process, this model therefore accounts for volatility 
clustering. 

Assuming that aq + Pi < 1, the unconditional variance of a t is 
o z a = var[a r ] = a 0 /[l - (oq + B\)] 

Also, assuming that the conditional distributions are normal, the fourth unconditional 
moment of a t is finite provided that (aq + ) 2 + 2a 2 < 1 (Bollerslev, 1986). In addition, 

the kurtosis of the marginal distribution of a t equals 

E («?) = 3[1 ~( ai +Bi) 2 ] 3 

[E(a 2 )] 2 l-(a l +fi 1 ) 2 -2a 2 l > 

As in the ARCH case, the unconditional distribution of a, thus has heavier tails than 
the normal distribution and is expected to give rise to a higher frequency of extreme 
observations or “outliers” than would be the case under normality. 

Now let v t = a 2 — a 2 so that a 2 = a 2 — v t , where the v t have zero mean and are serially 
uncorrelated. We then see that the GARCH(1,1) model can be rearranged as a 2 — u t = 
«0 + - v l- 1 ). or 


a 2 = a Q + (oq + B\)a 2 _ x + v, - B x v,_ { 


(10.2.11) 
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The process of squared errors thus has the form of an ARMA( 1,1) model with uncorrelated 
innovations v t . The v, are in general heteroscedastic, however. In the special case of /l, = 0, 
the model reduces to a 2 = a Q + a ] a 2 _ ( + v t , which is the AR(1) form of the ARCH(l) 
model. For the general GARCH(s, r) process, expression (10.2.11) generalizes to 

m s 

a; = a Q + + Pirf-i + U >~1 Pi u t—i 

1=1 i=l 

which has the form of an ARMA process for a 2 with AR order equal to m = max(r, s). The 
autocorrelation structure of a 2 also mimics that of the ARMA process provided that fourth 
unconditional moment of a, is finite (Bollerslev, 1988). 

The necessary and sufficient condition for second-order stationarity of the GARCHfv, r) 
process is 


Yj a i + Y h = Y (a i + < 1 

i=l i=l i=l 

When this condition is met, the unconditional variance is 

m 

i-5> + Pd 

i=i 

This was shown by Bollerslev (1986) who also gave necessary and sufficient conditions for 
the existence of all higher order moments for the GARCH(1, 1) model and the fourth-order 
moments for GARCH(1, 2) and GARCH(2, 1) models. Extensions of these results have 
been given by He and Terasvirta (1999) and Ling and McAleer (2002), among others. The 
expressions for the higher order moments and the constraints on the parameters needed 
to ensure their existence become more complex for the higher order models. The model 
specification also becomes more difficult. On the other hand, numerous studies have shown 
that low-order models such as the GARCH(1, 1), GARCH(2, 1), and GARCH(1, 2) models 
are often adequate in practice, with the GARCHG, 1) model being the most popular. 

10.2.3 Model Building and Parameter Estimation 

Testing for ARCH/GARCH Effects. The preceding results motivate the use of the ACF 
and PACF of the squares a 2 for model specification and for basic preliminary checking 
for the presence of ARCH/GARCH effects in the errors a r For an ARMA model with 
heteroscedastic errors, a starting point for the analysis is an examination of the sample 
ACF and PACF of the squared residuals if obtained from fitting an ARMA model to the 
observed series. In particular, let r k (a 2 ) denote the sample autocorrelations of the squared 
residuals a 2 so that 


<? a = var[a,] = « 0 


r k(a 2 ) 


n—k 

Yrf 

t =t 


■ ^ 2 )(« 2 


t+k 



where d 2 = n 1 X"=i ® 2 the residual variance estimate. Analogous to the modified 
portmanteau statistic described in Section 8.2.2, McLeod and Li (1983) proposed the 
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portmanteau statistic 


K 

Q(a 2 ) = n(n + 2) Y r 2 k (a 2 )/(n - k) (10.2.12) 

k =1 

to detect departures from the ARMA assumptions. As a portmanteau test, this test does 
not assume a specific alternative, but the type of departures for which Q(a 2 ) can be useful 
includes conditional heteroscedasticity in the form of ARCH/GARCH effects, and bilinear 
type of nonlinearity in the conditional mean of the process (see Section 10.3 for discussion of 
bilinear models). McLeod and Li (1983) showed that the statistic Q(a 2 ) has approximately 
the x 2 distribution with K degrees of freedom under the assumption that the ARMA model 
alone is adequate. The distribution is similar to that of the usual portmanteau statistic Q 
based on the residuals a t , with the exception that the degrees of freedom in the case of 
(10.2.12) are not affected by the fact that p + q ARMA parameters have been estimated. 
The potentially more powerful portmanteau statistics by Pena and Rodriguez (2002, 2006) 
discussed in Section 8.2 could also be applied to the squared residuals a 2 . 

An alternative test for ARCH effects is the score or Lagrange multiplier test proposed by 
Engle (1982). The score statistic A for testing the null hypothesis H {) : a, = 0, / = 1,..., s, 
has a convenient form and can be expressed as n times the coefficient of determination in 
the least-squares fitting of the auxiliary regression equation 

a 2 = (Xq + + <i2^_2 T 3" tx s a 2 _ s + £ r 

Assuming normality of the af s, the score statistic A has an asymptotic / 2 distribution 
with s degrees of freedom under the null model of no ARCH effects. The test procedure is 
thus to fit a time series model to the observed series, save the residuals a t , and regress the 
squared residuals on a constant and s lagged values of the a 2 . The resulting value of nR 1 
is then referred to a / 2 distribution with s degrees of freedom. Even though this test was 
derived for the ARCH(s) model, it has been shown to be useful for detecting other forms 
of conditional heteroscedasticity as well. Also, the test is asymptotically equivalent to the 
McLeod-Li portmanteau test based on the autocorrelations of the squared residuals (see 
Luukkonen et ah, 1988b). Thus, although the latter was derived as a pure significance test, 
it is also a LM test against ARCH effects. 

Parameter Estimation. The parameter estimation for models with ARCH or GARCH 
errors is typically performed using the conditional maximum likelihood method. For 
estimation of an ARMA model 4>{B)z t = 0 {) + 9{B)a t with ARCH or GARCH errors 
a t , we assume that a, is conditionally normally distributed as N(0,rr 2 ). The z, are 
then conditionally normal, given z t _ l , z t _ 2 ,..., and from the joint density function 
p(z) = n;i .j p(z t | z,_j,..., Zj) we obtain the log-likelihood function 

n n 

l = log (L) = - ^ logilii) - i Yj lo g(°f) - \ Yj (10.2.13) 

^ t= l ^ t= l 

where a, = z t — YJ’ I= \ ( h z t-i ~ ®i a t-i an d i s gi ven by (10.2.2) or (10.2.10). 

A discussion of the iterative maximization of the likelihood function along with other 
results related to the parameter estimation can be found, for example, in Engle (1982), Weiss 
(1984,1986), and Bollerslev (1986). When an ARMA model with ARCH or GARCH errors 
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is fitted to the series, the information matrix of the log-likelihood is block diagonal with 
respect to the conditional mean and variance parameters, so that iterations can be carried 
out separately with respect to the two sets of parameters. The so-called BHHH algorithm 
by Berndt, Hall, Hall, and Hausman (1974) provides a convenient method to perform 
the calculations. This algorithm has the advantage that only first-order derivatives are 
needed for the optimization. These derivatives can be evaluated numerically or analytically. 
Use of analytical first derivatives is often recommended as they improve the precision of 
the parameter estimates. Provided that the fourth-order moment of the process is finite, 
the resulting estimates of the ARMA-ARCH parameters are consistent and asymptotically 
normal as shown by Weiss (1986). 

The normal distribution was originally proposed by Engle (1982) to model the con¬ 
ditional distribution of the disturbances a t . As discussed earlier, the conditional normal 
distribution results in a leptokurtic unconditional distribution. Nevertheless, in financial 
applications the normal distribution sometimes fails to capture the excess kurtosis that is 
present in stock returns and other variables. To overcome this drawback, Bollerslev (1987) 
suggested using a standardized Student /-distribution with v > 2 degrees of freedom for the 
estimation. The density function of the /-distribution is 

/(„„). (, + ^UT WV2 

r(v/2)v^O-2) V 0 — 2)/ 


where F(v) = e _x x v_ 1 r/x is the Gamma function and v measures the tail thickness. 
As is well known, the distribution is symmetric around zero and approaches a normal 
distribution as v -» oo. For v > 4, the fourth moment exists and the conditional kurtosis 
equals 3(v — 2)/(v — 4). Since this value exceeds 3, the tails are heavier than those of the 
normal distribution. The log-likelihood function based on the /-distribution is given by 


/ = log(L) = n 


io g r 

n 

ii 


v + l) 


- l 08 (i) 


t= l 


log(ff") + (l + v)log 


- \ logOr(v - 2 )) 



Here, v is either prespecified or estimated jointly with other parameters. If v is specified 
in advance, values between 5 and 8 are often used; see Tsay (2010). With v prespecified, 
the conditional likelihood function is maximized by minimizing the second term of the 
likelihood function given above. 

Nelson (1991) suggested using the generalized error distribution (GED) for the estima¬ 
tion. The density function of a GED random variable normalized to have mean zero and 
variance one is given by 


„ , , _ »/exp(-0.5|x/A|’') 

120+1/^(1/17) 

where X = /rf)/T(?>/rf)] x ^ 2 . For the tail thickness parameter 17 = 2, the 

distribution equals the normal distribution used in (10.2.13). For rj < 2, the distribution 
has thicker tails than the normal distribution. The reverse is true for rj > 2. Box and Tiao 
(1973) call the GED distribution an exponential power distribution. 
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In addition to having excess kurtosis, the distribution of a, may also be skewed. A 
discussion of potential sources for skewness can be found in He et al. (2008). To allow for 
skewness as well as heavy tails, the likelihood calculations can be based on skewed versions 
for the Student /-distribution and the GED distributions available in software packages such 
as R. Other forms of skewed distributions have also been considered. 

In practice, it is often difficult to know whether the specified probability distribution is the 
correct one. An alternative approach is to continue to base the parameter estimation on the 
normal likelihood function in (10.2.13). This method is commonly referred to as the quasi¬ 
maximum likelihood (QML) estimation. The asymptotic properties of the resulting QML 
estimator for the ARCH, GARCH, and ARMA-GARCH models have been studied by 
many authors with early contributions provided by Weiss (1986) and Bollerslev and 
Wooldridge (1992). For further discussion and references, see, for example, Francq 
and Zakoian (2009, 2010). 


Diagnostic Checking. Methods for model checking include informal graphical checks us¬ 
ing time series plots and Q-Q plots of the residuals along with a study of their dependence 
structure. The assumption underlying the ARCH and GARCH models is that the standard¬ 
ized innovations a t /o t are independent and identically distributed. Having estimated the 
parameters of model, the adequacy of the mean value function can be checked by examining 
the autocorrelation and partial autocorrelation functions of the standardized residuals a t /<7 r 
Similar checks on the autocorrelation and partial autocorrelations of the squared standard¬ 
ized residuals are useful for examining the adequacy of the volatility model. These checks 
are often supplemented by the portmanteau test proposed by McLeod and Li (1983) or the 
score test proposed by Engle (1982). However, while these statistics can provide useful 
indications of lack of fit, their asymptotic distributions are impacted by the estimation of the 
ARCH or GARCH parameters. Li and Mak (1994) derived an alternative portmanteau statis¬ 
tic that asymptotically follows the correct / 2 K distribution. This statistic is a quadratic form 
in the first m autocorrelations of the squared standardized residuals but has a more complex 
form than the Q statistic in (10.2.12). Analogous modifications of Engle’s score test based 
on ARCH residuals were discussed by Lundbergh and Terasvirta (2002). More recent 
contributions to model checking include work by Wong and Ling (2005), Ling and Tong 
(2011), Fisher and Gallagher (2012), and many others. 


10.2.4 An Illustrative Example: Weekly S&P 500 Log Returns 

To demonstrate the model building process, we consider the weekly log returns on the 
S&P 500 Index displayed in Figure 10.1(b) for the period January 3, 2000 to May 27, 
2014. Figure 10.2 shows the ACF of the returns along with the ACF of the squared returns. 
We note that there is little, if any, serial correlation in the returns themselves. The mean 
value function ia, will thus be taken as a constant. However, the squared returns are clearly 
correlated and show a pattern consistent with that of an ARCH or a GARCH model. The 
PACF of the squared returns (not shown) has a pattern that persists over several lags 
suggesting that a GARCH may be appropriate for the volatility. 

The parameters can be estimated in R using the function garchFitQ in the fGarch 
package. The normal distribution is the default error distribution for the ARCH or GARCH 
models. Other options include the Student /-distribution and the GED distributions along 
with skewed versions of these distributions. For demonstration, we will fit a GARCH(1, 1) 
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(a) 



Lag 

(b) 



Lag 


FIGURE 10.2 Autocorrelation functions for (a) the S&P 500 weekly log returns and (b) the squared 
weekly log returns. 


model with normal errors to the returns. The R commands and a partial model output are 
provided below, where the log returns are denoted by SPrtn: 


>library(fGarch) 

>ml=garchFit(~garch(1,1),data=SPrtn,trace=F) 
>summary(ml) % Retrieve model output 

Title: GARCH Modelling 

Call: garchFit(formula=~garch(1,1),data=SPrtn, trace=F) 


Mean and Variance Equation: data ~ garch(1,1) 
Conditional Distribution: norm 

Coefficient (s) : 

mu omega alphal betal 

2.1875e-03 3.5266e-05 2.1680e-01 7.3889e-01 


Error Analysis: 

Estimate 

mu 2.187e-03 

omega 3.527e-05 
alphal 2.168e-01 
betal 7.389e-01 


Std. Error 
6.875e-04 
1.153e-05 
4.189e-02 
4.553e-02 


t value 
3.182 
3.058 
5.176 
16.230 


Pr(>|t|) 
0.00146 ** 
0.00223 ** 

2.27e-07 *** 
< 2e-16 *** 


Standardised Residuals Tests: 


Jarque-Bera Test 
Shapiro-Wilk Test 
Ljung-Box Test 
Ljung-Box Test 


Chi "2 
R W 

R Q(10) 

R Q(20) 


Statistic p-Value 
77.92548 0 

0.9815283 3.990011e-08 
6.910052 0.7339084 

16.43491 0.689303 
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Ljung-Box Test R~2 Q(10) 12.64346 0.244295 

Ljung-Box Test R~2 Q(20) 18.15442 0.5772367 

LM Arch Test R TR~2 14.05565 0.297169 

Information Criterion Statistics: 

AIC BIC SIC HQIC 

-4.751772 -4.727132 -4.751829 -4.742278 


Letting w t denote the log returns, the fitted model is 

w, = 0.002187 + a„ of = 0.000035 + 0.2168flf_j + 0.7389(T r 2 _ 1 

where all the parameter estimates are statistically significant. The portmanteau tests for 
serial correlation in the standardized residuals and in their squared values indicate no 
lack of fit. However, the Jarque-Bera and Shapiro-Wilk tests for normality suggest that 
the model is not fully adequate. To examine this issue, the Student /-distribution and 
its skewed version were tested by adding the argument cond.dist="std" and cond.dist 
="sstd", respectively, to the garchFit command. The GED distribution and its skewed 
version were also tested. Although these modifications improved the fit, the results are for 
simplicity not shown here. 

The standardized residuals from the fitted model and the ACF of the squared standardized 
residuals are shown in Figure 10.3. A normal Q-Q plot is also included in this graph. 
Visual inspection of the standardized residuals and the Q-Q plot confirms the results of 
the normality tests discussed above. The ACF of the squared residuals indicates no lack 
of fit although a marginally significant correlation is present at lag 1. This value would be 
reduced by fitting a GARCH(1, 2) model to the data. But this potential refinement is not 
pursued here. Finally, estimates of the conditional standard deviation o, are displayed in 
Figure 10.4(a). Figure 10.4(b) displays the volatility shown earlier in Figure 10.1(b) with 
two standard deviation limits now superimposed around the series. A variety of other graphs 
can be generated using the R command plot(ml), where ml refers to the fitted model. 
In addition, /-step-ahead forecasts of future volatility based on the conditional standard 
deviations shown in Figure 10.4 can be generated using the R command predict(m1,1). 

10.2.5 Extensions of the ARCH and GARCH Models 

While the ARCH and GARCH models allow for volatility clustering and capture thick¬ 
tailed behavior of the underlying unconditional distributions, they do not account for certain 
other features that are commonly observed in financial data. For example, so-called leverage 
effects are often observed in stock returns, where a negative innovation tends to increase 
the volatility more than a positive innovation of the same magnitude. In symmetric ARCH 
and GARCH models, on the other hand, the variance depends on the magnitude of the 
innovations but not their signs. Another limitation of the basic ARCH and GARCH models 
is the assumption that the conditional mean of the process is unaffected by the volatility. 
This assumption ignores the so-called risk premium that relates to the fact that investors 
expect to receive higher returns as compensation for taking on riskier assets. The presence 
of this feature would generate a positive relationship between expected return and volatility. 
Below we describe some extensions and modifications of the ARCH and GARCH models 
that have been proposed to address such issues. 



Sample quantiles ACF sres 
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Theoretical quantiles 

FIGURE 10.3 Model diagnostics for the GARCHfl, 1) model fitted to the S&P 500 weekly log 
returns: (a) standardized residuals, (b) autocorrelation function of the squared standardized residuals, 
and (c) a normal Q-Q plot of the standardized residuals. 


(a) 




FIGURE 10.4 Conditional standard deviations for the S&P 500 weekly log returns (a) and the 
weekly log returns with two standard deviation limits imposed (b). 
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Exponential GARCH Models. The earliest model that allows for an asymmetric response 
due to leverage effects is the exponential GARCH, or EGARCH, model introduced by 
Nelson (1991). The EGARCH(1, 1) model is defined as a, = cj,e t , where 

ln(of) = a 0 + g(e t _j) + P\\r\{a 2 _ x ) 

The function g(e t _ j) determines the asymmetry and is defined as the weighted innovation 

g(e t - 1) = a i e r-i + Yd\ e t- 11 --E(lG-tl)] 

where a l and jq are real constants. The model then becomes 

ln(cr r 2 ) = a 0 + a l e,_ l + Xl kr-t I - I) + 

From here it is easy to see that a positive shock has the effect (a l + while a negative 

shock has the effect (a j — y{)e t _i. The use of thus allows the model to respond 

asymmetrically to “good news” and “bad news.” Since bad news typically has a larger 
impact on volatility than good news, the value of a i is expected to be negative when 
leverage effects are present. Note that since the EGARCH model describes the relation 
between the logarithm of the conditional variance and past information, the model does 
not require any restrictions on the parameters to ensure that a 2 is nonnegative. The general 
EGARCHG, /•) model has the form 

s r 

ln(°f) = «o + X Si( e t-i) + X 


gj(e t _t) = a ; e r _ ; . + y j (|e ( _ i | - E (|e,_, |) 

However, as in the GARCH case, the first-order model is the most popular in practice. 

Nelson (1991) specified the likelihood function assuming that the errors follow a gener¬ 
alized error distribution that includes the normal distribution as a special case. Properties of 
the QML estimator based on the normality assumption for the EGARCH(1, 1) model were 
studied by Straumann and Mikosch (2006) who verified the conditions for consistency of 
this estimator. Further properties and details related to the model building process can be 
found in Tsay (2010) and Terasvirta et al. (2010), for example. 

The GJR and Threshold GARCH Models. The so-called GJR-GARCH model of Glosten, 
Jagannathan, and Runkle (1993) and the threshold GARCH model of Zakoian (1994) 
provide an alternative way to allow for asymmetric effects of positive and negative volatility 
shocks. Starting from the GARCH(1, 1) model, the GJR model assumes that the parameter 
associated with a 2 _ x depends on the sign of the shock so that 

( 7 r 2 = a Q + (a i + ril t -i)a 2 t _ x + p ] a 2 _ ] 

where the indicator variable I t _ j assumes the value 1 if a I _ [ is negative and zero if it is 
positive. The constraints on the parameters needed to ensure that the conditional variance 
a 2 is nonnegative are readily derived from those of the GARCH(1, 1) process. Using this 
formulation, the noise term a t _ x has a coefficient a l -I- jq when it is negative, and a j when 
it is positive. This allows negative shocks to have a larger impact on the volatility. The 
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GJR model is relatively simple and empirical studies have shown that the model performs 
well in practice. For general GARCHLv, r). the model generalizes to 

s r 

= a o + X (a '' + Yi't-irf-i + Z 

i =1 7=1 

although applications with r and s greater than 1 seem to be very rare. Zakoian (1994) 
introduced a model with the same functional form as the GJR model, but instead of 
modeling the conditional variance, Zakoian models the conditional standard deviation. 
Since the coefficient associated with a t _i changes its value as a t _ { crosses the threshold 
zero, Zakoian referred to this model as a threshold GARCH, or TGARCH, model. 

Nonlinear Smooth Transition Models. For the threshold model described above, the 
impact of past shocks changes abruptly as a t _ : crosses the zero threshold. Attempts have 
been made in the literature to develop nonlinear extensions of ARCH and GARCH models 
that allow for more flexibility and a smoother transition as a lagged value a r _ ; crosses a 
specified threshold. These extensions include the logistic smooth transition GARCH model 
proposed by Hagerud (1997), and a similar model proposed independently by Gonzalez- 
Rivera (1998). This model assumes that the model parameters a, in the ARCH or GARCH 
model are not constant but functions of the lagged a t _ t so that a, = a u + a 2l F(a t _ i ). i = 
1,..., s, where F(-) is a transition function. Hagerud considered two transition functions, 
the logistic and the exponential. The GARCH(s, r) model with a logistic transition function 
has the form 


S 

e; = «o + Z [ “i' + a 2 i F ( a t-i)tf-i 

i= I 


+ Z fa 


2 

t~j 


7=1 


where 


' 1 + exp(— 0a t _i) 2 

with 6 > 0. In contrast to the GJR model that follows one process when the innovations 
are positive and another process when the innovations are negative, the transition between 
the two states is smooth in the present model. Hagerud provided conditions for stationarity 
and nonnegativity of the conditional variances. 

Lanne and Saikkonen (2005) proposed a smooth transition GARCH process that uses 
the lagged conditional variance of^ as the transition variable, and is suitable for describing 
high persistence in the conditional variance. The first-order version of this model can be 
written as 


of = a 0 + a 1 af_ 1 + S 1 G 1 (0; a]_^) + (l\o]_ x 

where the transition function G x (0\ of_ ) is a continuous, monotonically increasing bounded 
function of of j- Lanne and Saikkonen used the cumulative distribution function of the 
gamma distribution as the transition function. The original purpose for introducing this 
model was to remedy a tendency of GARCH models to exaggerate the persistence in 
volatility as evidenced by Z(a, + flj ) often being very close to one. Using empirical examples 
involving exchange rates, the authors showed that this formulation alleviates the problem 
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of exaggerated persistence. For further discussion of these and related models, see, for 
example. Mills and Markellos (2008) and Terasvirta (2009). 

GARCH-M Models. Many theories in finance postulate a direct relationship between the 
expected return on an investment and its risk. To account for this, the GARCH-in-mean, 
or GARCH-M, model, allows the conditional mean of a GARCH process to depend on 
the conditional variance of. This model originates from the ARCH-M model proposed by 
Engle et al. (1987). The mean value function is specified as 

Ft = Po+Pistf) 

where g(of) is a positive-valued function and /), is a positive constant called the risk 
premium parameter. An increase or decrease in the conditional mean is here associated 
with the sign of the partial derivative of the function g(of) with respect to of. In many 
applications, g(of) is taken to be the identity function or the square root function so that 
g(t t 2 ) = of or g(of) = (jj. The parameters of the GARCH-M model can be estimated using 
the maximum likelihood method. However, because of the dependence of the conditional 
mean on the conditional variance, the information matrix is no longer block diagonal with 
respect to the conditional mean and variance parameters. This makes joint maximization 
of the likelihood function with respect to the two sets of parameters necessary. Also, 
consistent estimation of the parameters in the GARCH-M models requires the full model 
be correctly specified. Applications of the GARCH-M model to stock returns, exchange 
rates, and interest rates were discussed by Bollerslev et al. (1992). 

IGARCH and FIGARCH Models. As noted earlier, the GARCH( 1,1) model is weakly sta¬ 
tionary assuming that {a x + /3j) < 1. When the GARCH model is applied to high-frequency 
financial data, it is often found that + /), is close to or equal tol. Engle and Bollerslev 
(1986) refer to a model with aq + = 1 as an integrated GARCH, or IGARCH, model. 

The motivation is that this implies a unit root in the autoregressive part of the ARMA(1, 1) 
representation of the GARCH(1, 1) model for af in (10.2.11). With «i + Pi = 1, the model 
becomes (1 — B)af = a () + v r — /), v r _ j. Similar to a random walk process, this process is 
not mean reverting since the unconditional variance of the process is not finite. Also, the 
impact of a large shock on the forecasts of future values will not diminish for increasing lead 
times. But while the GARCH(1,1) process is not weakly stationary, Nelson (1990) showed 
that the process has time-invariant probability distributions and is thus strictly stationary. A 
necessary condition for strict stationarity is E[/n(a l af_ 1 + /),)] < 0. For further discussion 
of this model, see, for example, Terasvirta (2009). 

Fractionally integrated GARCH, or FIGARCH, models have also been proposed in the 
literature. These differ from the IGARCH model in that the degree of differencing d is 
allowed to be a fraction rather than a constant. The FIGARCH(1, 1) model, in particular, 
is of the form (1 — B) d af = a 0 + v r — /?] v r _ t , where d is a constant such that 0 < d < 0.5. 
For the FIGARCH model, the empirical autocorrelations of af need not be very large but 
they decay very slowly as the lag k increases. This is indicative of so-called long memory 
behavior in the series. Models involving fractional differencing will be discussed further 
in Section 10.4 in relation to long-range dependence in the conditional mean 

Other Models. Numerous other models have been proposed to account for conditional 
heteroscedasticity. For example, a natural extension of the ARCH(s) model specified in 
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(10.2.1) is to let of = (Xq + where a,_ | = (a,_ i, ... ,a t _ s )' and Q is a s X s 

nonnegative definite matrix. The ARCH(.v) model is then a special case that requires 
that Q be diagonal. One way that the above form can arise is through the conditional 
heteroscedastic ARMA (CHARMA) model specification discussed by Tsay (1987). Other 
approaches to volatility modeling include the random coefficient autoregressive model of 
Nicholls and Quinn (1982) and the stochastic volatility models of Melino and Turnbull 
(1990), Jacquier et al. (1994), and Harvey et al. (1994). A brief description of the stochastic 
volatility models is provided below. 


10.2.6 Stochastic Volatility Models 

Stochastic volatility models are similar to GARCH models but introduce a stochastic 
innovation term to the equation that describes the evolution of the conditional variance of. 
To ensure positiveness of the conditional variances, stochastic volatility models are defined 
in terms of ln(er 2 ) instead of of. A basic version of a stochastic volatility model is defined 
by a t = o,e t as in (10.2.1) with ln(t7 2 ) satisfying 

ln(<7 2 ) = «(, + /?! In (of j) + ••• + p r \n(of_ r ) + v t (10.2.14) 

where e t are iid normal N( 0, 1), v t are iid normal N( 0, of), {e,} and {v t } are independent 
processes, and the roots of the characteristic equation 1 — ^( =| /J ( B 1 = 0 are outside the unit 
circle. Note, for example, the stochastic volatility model equation for /• = 1 is ln(cr 2 ) = a 0 + 
/?! \r\(of_ |) + v t , which is somewhat analogous to the GARCH(1,1) model equation, of = 
a Q + of_ ] + a l af_ l . Alternatively, replacing g(e r _Q by v t in the EGARCH(1, 1) model, 
we obtain (10.2.14) with r = 1. Some properties of the stochastic volatility model for 
/• = 1 are provided by Jacquier et al. (1994). Also note that we may write af = ofef so that 
ln(o“) = ln(er 2 ) + ln(e 2 ). This allows the stochastic volatility model to be viewed as a state- 
space model, with the last relation representing the observation equation and the transition 
equation being developed from (10.2.14). Difficulty in parameter estimation is increased 
for stochastic volatility models, however, since likelihoods based on the state-space model 
are non-Gaussian. Quasi-likelihood methods may thus be needed. Jacquier et al. (1994) 
give a good summary of estimation techniques, including quasi-likelihood methods with 
Kalman filtering and the expectation maximization (EM) algorithm and Markov chain 
Monte Carlo (MCMC) methods. They also provide a comparison of estimation results 
between the different methods. 

A discussion and examples of the use of Markov chain Monte Carlo methods for 
parameter estimation can also be found in Tsay (2010, Chapter 12). A general overview of 
the stochastic volatility literature is given by a collection of articles in the books edited by 
Shephard (2005) and Andersen et al. (2009). 


10.3 NONLINEAR TIME SERIES MODELS 

Many processes occurring in the natural sciences, engineering, finance, and economics 
exhibit some form of nonlinear behavior. This includes features that can not be modeled 
using Gaussian linear processes such as lack of time reversibility evidenced, for exam¬ 
ple, by pseudocyclical patterns where the values slowly rise to a peak and then quickly 
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decline to a trough. Time series that exhibit occasional bursts of outlying values are also 
unlikely under the linear Gaussian assumption. The prevalence of such series has led to 
an interest in developing nonlinear time series models that can account for such behavior. 
Nonlinear models proposed in the literature include bilinear models, threshold autore¬ 
gressive (TAR) models, exponential autoregressive (EXPAR) models, and stochastic or 
random coefficient models. These models describe nonlinearities in the conditional mean 
as opposed to nonlinearities in the conditional variance as discussed in Section 10.2. When 
nonlinearities are present, model identification and estimation become more complicated, 
including the fundamental problem of which type of nonlinear model might be useful for 
a particular time series. This section presents a brief description of some nonlinear models 
that have been proposed in the literature. More comprehensive discussions are available in 
texts such as Tong (1983, 1990), Priestley (1988), Franses and van Dijk (2000), Fan and 
Yao (2003), Tsay (2010, Chapter 4), and Terasvirta et al. (2010). 


10.3.1 Classes of Nonlinear Models 

Many nonlinear ARMA models can be viewed as special cases of the following general 
form: 


z t ~ 0iCF r -i)z r -i- <t> p (Y t _{)z t _ p 

= e 0 0r,_{) + a t - 6\(Y t _i)a t _i - e q (Y t _ x )a t _ q (10.3.1) 

where 

^"<-1 = ( Z t -1-- Z /-p’ fl r-D ••• > a t-q)' 

and </>,(Y r _j) and 0j(Y t _{) are functions of the “state vector” Y t _j at time 1—1. For 
specific cases, we mention the following models. 

1. Bilinear Models. Let the </>,- be constants, and set 0j(Y t _ j) = bj + 'L^_ l b i jZ t _ i . Then 
we have the model 

q k q 

z, - <t> x z t _ x - <f> p z t _ p = 6 Q + a,-Y l b j a t-j “ Z Z b ij z t-i a t-j (10.3.2) 

7=1 1=1 7=1 

Equivalently, with the notations p* = max(p, k), (j>, = 0, / > p, b^ = 0, i > k, and 
a t (l) = Y q ._ ! bjja t _j , (10.3.2) can be expressed in the form 

p* 9 

z t - “ a '(0]z,-, = ^0 + a t - Z b j a '-j 

1=1 7=1 

and be viewed in the form of an ARMA model with random coefficients for the AR 
parameters, which are linear functions of past values of the innovations process a t . 
The statistical properties of bilinear models were studied extensively by Granger and 
Anderson (1978). Methods for analysis and parameter estimation were also studied 
by Subba Rao (1981) and Subba Rao and Gabr (1984), and various special cases of 
these models have been examined by subsequent authors. 
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Conditions for stationarity and other properties have been studied for the gen¬ 
eral bilinear model by Tuan (1985, 1986) and Liu and Brockwell (1988), in par¬ 
ticular. For example, consider the simple first-order bilinear model z t — <p l^r-i = 
a t — b\ [c /,-1 . It is established that a condition for second-order stationarity of 
such a process {z t } is </>j + o^b 2 u < 1, and that the autocovariances of z t under sta¬ 
tionarity will satisfy = </> 1 for j > 1. Thus, this process will have essentially 
the same autocovariance structure as an ARMA(1,1) process. This example high¬ 
lights the fact that moments higher than the second order are typically needed in 
order to distinguish between linear and nonlinear models. 

2. Amplitude-Dependent Exponential AR Models. Let 0 ( - = 0, and set cp i (Y t _ l ) = 
_ 2 

b t + , where c > 0 is a constant. Then we have 

p _ 2 

z t ~ + e ~ CZ '-'^ z t-i = a t (10.3.3) 

i= 1 


This class of models was introduced by Haggan and Ozaki (1981), with an aim to 
construct models that reproduce features of nonlinear random vibration theory. 

3. Threshold AR, or TAR, Models. Let 0, = 0, / > 1, and for some integer time lag d 
and some “threshold” constant c, let 


4>,<X,- 1 ) 


W-t) 



if z,_ d 

< c 


if z,_ d 

> c 


if z t _ d 

< c 

,(2) 

^0 

if z t _ d 

> c 


Then we have the model 


z, = < 


q(D 


a(D. 


(=1 


f + z 

i= 1 



if z,_ d < c 

+ a™ 

if z,_ d > c 


(10.3.4) 


( 1 ) ( 2 ) • 9 9 

where {a,} and [a~ } are each white noise processes with variances <r~ and a ~, 

respectively (e.g., we can take ap = The value c is called the threshold pa¬ 

rameter and d is the delay parameter. A special case arises when the parameter c 
is replaced by a lagged value of the series itself, resulting in a model called the 
self-exciting TAR (SETAR) model. 

The model (10.3.4) readily extends to an “/-threshold” model of the form 
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z * = 0 o 
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5M 


u). 


■ + a U) 

■t-i ^ u t 


if 




< z 


t-d 


< c, 


7 = 1 ,...,/ 
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with threshold parameters cj < C2 < < C/_ 1 (and c 0 = — 00 , c; = + 00 ), which de¬ 

fine a partition of the real line into / subintervals. The first-order threshold model, 

z t = 9^ + + a\ j) if Cj—i < z t _ 1 < Cj 

for example, may thus be regarded as a piecewise linear approximation to a general 
nonlinear first-order model z, = g(z t _i) + a t , where g(-) is some general nonlinear 
function. 


The TAR models were introduced by Tong (1978) and Tong and Lim (1980) and 
discussed in detail by Tong (1983, 1990). Tong (2007) gives a brief discussion of their 
origin. The basic threshold AR model can be seen as a piecewise linear AR model, with a 
somewhat abrupt change from one equation or ‘ ‘regime’ ’ to another dependent on whether 
or not a threshold value Cj is exceeded by z t _ d . A generalization that allows for less abrupt 
transition from one regime to another has been developed as a class of models known as 
smooth transition AR (STAR) models; see, for example, Terasvirta (1994) and Terasvirta 
et al. (2010). For the case of a single threshold 1 = 1, the basic form of a STAR model is 


-C+Z*! 

1=1 


(i). 


+ 


S (=1 


( 2 ) 


z,_i F(z,_ d ) + a, 


where F(z) = 1/[1 + exp{— y(z — c)}] in the case of a logistic STAR model and in the 
normal STAR model F(z) = <t>(j '(z — c)), with <!>(•) equal to the cumulative distribution 
function of the standard normal distribution. By letting y —► 00 , we see that F(z) tends 
to the indicator function, and the usual two-regime TAR model (10.3.4) is obtained as a 
special case. The TAR model and its extensions have been used to model nonlinear series in 
many diverse areas such as finance and economics, the environmental sciences, hydrology, 
neural science, population dynamics, and physics; for selected references, see Fan and Yao 
(2003, p. 126). 

Other types of nonlinear models include the stochastic or random coefficient models. 
For example, in the simple AR(1) model we consider z t = (p,z t _, + a t , where <fi, is not a 
constant but is a stochastic parameter. Possible assumptions on the mechanism generating 
the </>, include (i) the cp t are iid random variables with mean (p and variance er^, independent 
of the process {a,}, and (ii) the cp, follow an AR(1) process themselves, 


(p,-(p = a(cp t _i ~(p) + e, 


where cp is the mean of the <p t process and the e t are iid random variables with mean 0 
and variance <j~, independent of a t . Estimation for the first case was considered in detail 
by Nicholls and Quinn (1982), while the second case may in principle be estimated using 
state-space methods (e.g., Ledolter, 1981). 

Additional classes of nonlinear models include the general state-dependent model form 
(10.3.1) examined extensively by Priestley (1980, 1988), or more general nonparametric 
autoregressive model forms such as nonlinear additive autoregressive models considered 
by Chen and Tsay (1993), and adaptive spline threshold autoregressive models used by 
Lewis and Stevens (1991). Nonparametric and semiparametric methods such as kernel 
regression and artificial neural networks have also been used to model nonlinearity. A 
review of nonlinear time series models with special emphasis on nonparametric methods 
was provided by Tjpstheim (1994). More recent discussions of the developments in this 
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area can be found in Fan and Yao (2003), Gao (2007), and Terasvirta et al. (2010). A 
discussion of nonlinear models with applications to finance is provided by Tsay (2010, 
Chapter 4). 


10.3.2 Detection of Nonlinearity 

Many methods have been proposed to detect nonlinearity of a time series. In addition to 
informal graphical methods and inspection of higher order moments, such as third- and 
fourth-order moments, these include more formal test procedures by Hinich (1982), Subba 
Rao and Gabr (1980), McLeod and Li (1983), Keenan (1985), Tsay (1986a), Petruccelli 
and Davies (1986), Luukkonen et al. (1988a), and others. Some of these tests exploit the 
nonlinear dependence structure that is reflected in the higher order moments, and many 
of the tests are developed as portmanteau tests based on a linear model, with an alterna¬ 
tive not explicitly specified. Other tests are Lagrange multiplier or score-type procedures 
against specified alternative models. For example, the tests of Luukkonen et al. (1988a) 
are score-type tests against STAR alternatives. The tests of Subba Rao and Gabr (1980) 
and Hinich (1982) are nonparametric tests that use a bispectral approach, while the test of 
Petruccelli and Davies (1986) is based on cumulative sums of standardized residuals from 
autoregressive fitting to the data. The portmanteau test statistic (10.2.12) of McLeod and 
Li (1983) is based on sample autocorrelations of squared residuals a~ from a fitted linear 
ARMA model. This test was introduced as a test for nonlinearity, although simulations 
suggest that it may be more powerful against ARCH alternatives. A modest gain in power 
may be possible by basing the nonlinearity checks on the portmanteau statistics proposed 
by Pena and Rodriguez (2002, 2006). 

Keenan (1985) proposed an F-test for nonlinearity using an analogue of Tukey’s single- 
degree-of-freedom test for nonadditivity. The test is also similar to the regression specifi¬ 
cation error test (RESET) proposed by Ramsey (1969) for linear regression models. The 
test can be implemented by first fitting an AR(m) model to the observed series z r , where 
m is a suitably selected order. The fitted values are retained and their squares are added 
as a predictor variable to the AR(m) model. This model is then refitted and the coeffi¬ 
cient associated with the predictor variable is tested for significance. This procedure thus 
amounts to determining whether inclusion of the squared predicted values helps improve 
the prediction. 

Tsay (1986a) proposed an extension based on testing whether second-order terms have 
additional predictive ability. The procedure can be carried out as follows: First fit a linear 
AR(m) model and obtain the residuals a, from this fit. Then consider the M = '-m(m + 1) 
component vector 


•Z) — p > z t-m' Z t-\ z t-2-> ••• > z t-m+\ z 1-m) 

consisting of all squares and distinct cross-products of the lagged values z t _ [,..., z,_ m . Now 
perform a multivariate least-squares regression of the elements of Z t on the set of regressors 
{1, z t~i> • ••> z i- m 1 an d obtain the multivariate residual vectors U t , for t = m + 1 
Finally, perform a least-squares regression a, = U t fi + e, of the AR(m) model residuals a t 
on the M -dimensional vectors U t as regressor variables, and let F be the F ratio of the 
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regression mean square to the error mean square from that regression, so that 

F = --- (10.3.5) 

i;Ui *?A» -m-M- 1) 

Under the assumption of linearity, F has, for large n, an approximate F distribution with 
M and n — m — M — 1 degrees of freedom, and the null hypothesis of linearity is rejected 
for large values of F. Extension to a procedure for residuals a, from a fitted ARMA(p, q) 
model was also mentioned by Tsay (1986a). 

If one aggregates or condenses the information in the M-dimensional vector Z t into 
a single variable z~ = (9 q + r/3 ; z ( _,) 2 , which is the square of the fitted value from the 

AR(m) model, and performs the remaining steps outlined above, one obtains the earlier test 
by Keenan (1985). The associated test statistic is 

p _ (£uVr) 2 /(Iuu) 

2"=,„+! e 1/(« - 2m - 2) 

with 1 and n — 2m — 2 degrees of freedom. Luukkonen et al. (1988b) and Tong (1990, 
Section 5.3) noted a score test interpretation of the procedures proposed by Keenan (1985) 
and Tsay (1986a). Both tests are available in the TSA package of R and can be implemented 
using the commands Keenan.test(z) and Tsay.test(z). For further discussion, see Tsay 
(2010, Chapter 4). 


10.3.3 An Empirical Example 

For illustration, we consider modeling of the Canadian lynx dataset, consisting of annual 
numbers of Canadian lynx trapped in the MacKenzie River district for the period 1821 
to 1934. The series is available in the R datasets package. For several reasons, the 
logio transformation of the data is used in the analysis, denoted as z t , t = 1,... ,n, with 
n = 114. Examination of the time series plot of z t in Figure 10.5 shows a very strong 
cyclical behavior, with period around 10 years. It also shows an asymmetry or lack of time 
reversibility in that the sample values rise to their peak or maximum values more slowly 
than they fall away to their minimum values (typically, about 6-year segments of rising and 
4-year segments of falling). This is a feature exhibited by many nonlinear processes. There 
are biological/population reasons that would also support a nonlinear process, especially 
one involving a threshold mechanism; see, for example, Tong (1990). 

The sample ACF and PACF of the series {z,} are shown in Figure 10.6. The ACF 
exhibits the cyclic feature clearly, and based on features of the sample PACF a linear 
AR(4) model is initially fitted to the series, with <7“ = 0.0519. The presence of some 
moderate autocorrelation at higher lags, around lags 10 and 12, in the residuals from the 
fitted AR(4) model suggested the following more refined model that was estimated by 
conditional LS: 


z, = 1.149 + 1.038z f _j - 0.413z r _ 2 + 0.252z r _ 3 - 0.229z,_ 4 
+ 0.188zj_9 — 0.232z r _p -t- ci^ 


(10.3.6) 


with residual variance estimate o 2 = 0.0380. 
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(b) 

FIGURE 10.5 Logarithms (base 10) of the Canadian lynx time series for 1821-1934, with forecasts 
for 90 periods ahead from (a) the TAR model and (b) the linear subset AR(12) model. 


Some diagnostics of this fitted model suggest possible nonlinearity. Specifically, there 
is strong autocorrelation in the squared residuals of at lag 2, with r 2 (a 2 ) = 0.401, and 
nonlinear features exist in scatter plots of the “fitted values” z t = z t _[(l) and residuals 
a t = z t — z f _[(l) versus lagged values z t _j, for lags j = 2, 3,4. But the tests by Keenan 
(1985) and Tsay (1986a), implemented in the TSA package of R, are inconclusive in that 
the Keenan test rejects linearity whereas the Tsay test does not (see the output below). 
However, it appears that the failure of the Tsay test to detect the nonlinearity may be due 
to the way the package computes the Tsay statistic. This computation uses 77 parameters 
and results in an observation/parameter ratio of 114/77 < 2, which is too small for valid 
inference. 
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Series: log 10 (lynx) 



Lag 

FIGURE 10.6 Autocorrelation and partial autocorrelation functions for the logarithm of the 
Canadian lynx series. 


> library(TSA) 

> data(lynx) 

> z=logl0(lynx) 

> Keenan.test(z) 

$test.stat: 11.66997 
$p.value: 0.000955 

$order: 11 

> Tsay.test(z) 

$test.stat: 1.316 
$p.value: 0.2256 

$order: 11 

Tong (1990) specified a TAR model, with time delay of d = 2 and threshold value of about 
c » 3.10 for this series. A threshold version of the AR model in (10.3.6), with two phases 
and terms at lags 1, 2, 3, 4, 9, and 12, was estimated by conditional LS. After eliminating 
nonsignificant parameter estimates, we arrived at the following estimated threshold AR 
model: 


z, = 1.3206 + 0.9427z,_! - 0.2161z,_ 4 

-O.UUzt-n + a™ if z,_ 2 < 3.10 
= 1.8259 + 1.1971z ( _! - 0 . 7266 z ,_ 2 + 0.1667z r _ 9 
- 0 . 2229 z ,_ 12 + af ] if z,_ 2 > 3.10 

with residual variance estimates aj = 0.0249 and = 0.0386 (pooled a 2 = 0.0328). 
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The approximate “eventual" forecast function from this model will lead to periodic 
limit cycle behavior with an approximate period of 9 years (see Tong (1990) for discussion 
of limit cycles). Although exact minimum MSE forecasts z„(/) for lead times / > 2 are not 
easily computed for the fitted threshold AR model, approximate forecasts for larger / can 
be obtained by projecting series values forward with future white noise terms u t ' ] set to 0 
(see Terasvirta et al. (2010, Chapter 14) for other options). Values obtained in this way for 
the eventual forecast function from the TAR model are depicted for 90 years, / = 1,..., 90, 
in Figure 10.5(a). These values exhibit a limit cycle with a period of essentially 9 years 
(in fact, the period is 28 years with 3 “subcycles”), and the asymmetric feature of slower 
rise to peak values and faster fall to minimum values is visible. In contrast, the stationary 
linear AR model will give a forecast function in the form of very slowly damped sinusoidal 
oscillations that will eventually decay to the mean value of the process, 2.90. This forecast 
function is shown in Figure 10.5(b). 

Other nonlinear models have been considered for the Canadian lynx data. For examples, 
Subba Rao and Gabr (1984) have estimated a bilinear model for these data, an AR(2) 
model with random coefficients was fitted by Nicholls and Quinn (1982), and an amplitude- 
dependent exponential AR model of order 11 was fitted to the mean-adjusted log lynx data 
by Haggan and Ozaki (1981). 


10.4 LONG MEMORY TIME SERIES PROCESSES 

The autocorrelation function p k of a stationary ARM At/;, q ) process decreases rapidly as 
k —>■ oo, since the autocorrelation function is geometrically bounded so that 

\p k \<CR k , k = 1,2,... 

where C > 0 and 0 < R < 1. Processes with this property are often referred to as short 
memory processes. Stationary processes with much more slowly decreasing autocorrelation 
function, known as long memory processes, have 

p k ~ Ck 2d ~ l as k — ► oo (10.4.1) 

where C > 0 and—0.5 < d < 0.5. Empirical evidence suggests that long memory processes 
are common in fields as diverse as hydrology (e.g., Hurst, 1951; McLeod and Hipel, 1978), 
geophysics, and financial economics. The sample autocorrelations of such processes are 
not necessarily large, but tend to persist over a long period. The latter could suggest a 
need for differencing to achieve stationarity, although taking a first difference may be too 
extreme. This motivates the notion of fractional differencing and consideration of the class 
of fractionally integrated processes. 


10.4.1 Fractionally Integrated Processes 

A notable class of stationary long memory processes z t is the fractionally integrated ARMA, 
or ARF1MA, processes defined for —0.5 < d < 0.5 by the relation 


1 - B) d z t = 9(B)a t 


(10.4.2) 
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where {o r } is a white noise sequence with zero mean and variance a 2 , and <p( B ) = 0 and 
9(B) = 0 have all roots greater than one in absolute value. The class of models in (10.4.2) 
was initially proposed and studied by Granger and Joyeux (1980) and Hosking (1981) as an 
intermediate compromise between fully integrated ARIMA processes and short memory 
ARMA processes. More comprehensive treatments of these models can be found in texts 
by Beran (1994), Robinson (2003), and Palma (2007). 

For d > — 1, the operator (1 — B) d in (10.4.2) is defined by the binomial expansion 

OO 

(1 - B) d = J XjB' (10.4.3) 

j =o 


where 7r 0 = 1 and 


rp - d) 

1 ro + un-io 


0 <k<j 


j = 1,2,... 


(10.4.4) 


and F(x) is the gamma function. Hence, the jij follow the simple recursion 

' j-\-d' 


] 


n i -1 


A particular special case is the fractionally integrated white noise process w t , defined 
by 

(1 — B) d w t = a t 

For —0.5 < d < 0.5, since the power series expansion of i //(B) = (1 — B)~ d = 
converges for | B \ < 1, it follows that such a process { w t ) is stationary and has the infinite 
MA representation 


w, = ( 1 - B) d a, = ^ Wj a t-j 
j= 0 


where 

_ rp + d) _ "1 r k - 1 + d 1 .d -1 

¥j ~ rp + i)r(d) ~ k ~r (d) J 


as j ->• oo 


(10.4.5) 


(10.4.6) 


It can also be shown (Hosking, 1981; Brockwell and Davis, 1991, Chapter 12) that the 
fractionally integrated white noise process has variance 


and ACF 


y 0 (w) = var [w t ] 


o 2 a r(l - 2d) 

[r(l - d)] 2 


r(h + d)r(l-d) 
Ph W) r(h - d + 1)F (d) 


n 

0<k<h 


k-l + d 
k — d 


h = 1,2,... 


(10.4.7) 


In particular, we have p\(w) = d /(1 — d), and p^(w) = [(h — 1 + d)/(h — d)]p h _^(w). It 
follows, using Stirling’s formula F(x) ~ \[2ne~ x+ 1 (x — l) x_1 / 2 as x -» oo, that the ACF 
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behaves like 

, , , id -1 r(i — d) 

p h (w) ~ h -- as h -»■ oo 

W 7 T(d) 

the characteristic feature of the ACF of a long memory process. In addition, by use of the 
Levinson-Durbin recursion algorithm described in Appendix A3.2, values for the partial 
autocorrelations of the fractionally integrated white noise process can be determined by 
induction and shown to be <fi kk = d/(k — d).k = 1,.... 

The fractionally integrated white noise process itself may be of limited use in modeling 
long memory behavior since the single parameter d can allow for only a restrictive class 
of autocorrelation function forms. This process can be useful, however, in building of the 
more general class of long memory processes. In fact, we can see from the above definition 
that a fractionally integrated ARMA(p, d.q) process, 4>(B)( 1 — B) d z t = 6{B)a t , can be 
interpreted as an ”ARMA(p, q) process driven by fractionally integrated white noise,” 
that is, {z,} satisfies <p(B)z, = 9(B)w t , with (1 — B) d w t = a t . From general results on 
linear filtering, we see that the exact autocovariance function of {z,} can be expressed 
in terms of the autocovariance function of the fractionally integrated white noise process 
{w t } as 

OO OO 

Yh( z ) = II ¥j¥k7h+j-k( w ) (10.4.8) 

7=0 k=0 


where the y/j are the coefficients in i//( B ) = ip(B) l 6(B) = Vj B J anc * 

F(1 - 2d)F(h + d) 


r h ( w ) = Yo( w )Ph( w ) = °a 




r(/i - d + i)r(d)r(i - d) 
(-l)'T(l -2 d) 
'T{h-d+\)T{\-h-d) 


is the autocovariance function of the fractionally integrated white noise process { w t }. 

In terms of the spectrum, from (3.1.12) the spectrum of a fractionally integrated ARIMA 
(p, d, q) process { z . t } is 


p z (/) = 2<r^| 1 - e- i2nf r 2d |6>(g 7)1 0 < / < - (10.4.9) 

Fz J a ' 1 |0( e -;2*/)p ~ J 2 

where p w (f) = 2o 2 u \ 1 — e~ t2yr l | ~ 2d = 2.a 2 [2 sin(7r f)]~ 2d is the spectrum of the fractionally 
integrated white noise process. In particular, we see that p 7 (f) does not remain finite as 
/ -h> 0 for 0 < d < i. Since sin(x) ~ x as x ->• 0, we have the behavior that 


p.(f) ~ 2(7 


2 

a 


" | 6 >( 1 )| 2 " 

m)\ 2 . 


(2 nfy 2d 


C*f~ 2d 


as / 0 


which is a distinguishing feature of the spectrum of long memory processes, for 0 < d < i. 


Two Simple Special Cases. In practice, ARIMA(p, d, q) models are likely to be most useful 
for small values of p and q. So, we mention a few specific details given by Hosking (1981) 
about characteristics of two of the simplest such models. First, consider the fractional 
ARIMA(1, d. 0) model, (1 — <pB){\ — B) d z t = a t , with AR parameter — 1 < <p < 1. Then 
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(1 — 4>B)z, = w,oxz, = (1 — 4>B) 1 w, = J]“ = Q</> J fT> r _j,sousmg(10.4.7)and(10.4.8)with 
i j/j = <p J it follows that the autocorrelation function of { z, } is 

p,(w) F(d + l,\'A-d + /; 0) + F{d - /, 1; 1 - d - /; 0) - 1 
P/ Z “ 1 -</> F(l+d, l;l-d;</>) 

where F(a, b\ c; x) is the hypergeometric function defined by 


F(a, b;c',x) = 1 + 


ab 


-x + 


r(c) 

r(fl)r(fc) 


00 


1 


a(a + l)b(b + 1) 2 
c(c + 1) • 1 • 2 * + 
r(a + k)F(b + k) k 
r(c + k)k\ 


and 


r 0 (H = y q (w) 11 cj} J+k Pj_ k (w) 

j =0 k=0 

- *"’W 1; 1 U = "“ r(1 ■ "> 


1 — </> 2 


r(i - d) 2 


l + </> 


Given <fi and d, values of /■’(c/ + /, 1; 1 — d + /; </>) required in computing the y ; (z) 
y 0 (z)pi(z) may be obtained more conveniently using the recurrence relation 


F(d + l - 1,1; 1 - d +1 - 1; 0) = + 1 —— <j)F(d + 1, 1; 1 - d + /;</>) + 1 

1 — d + / — 1 


Second, for the fractional ARIMA(0, d, 1) model, (1 — B) d z, = (1 — 9B)a t , with —1 < 6 < 
1, we have z t = (1 — 6B)w t . So again using (10.4.7) and (10.4.8), now with i p 0 = 1, \p\ = 
—9, and ipj = 0 for j > 1, we find that 


Yi(z) = 7o<»[(] + 9 2 )p,(w) - 9p l+l (w ) - 0/9 ; _t(u;)] 


and the ACF of { z . t } is 


M z ) = /?,(«;) 


al 2 - (1 - d) 2 
/ 2 - (1 - d) 2 


where 


a = (1 - 6») 2 


1 + 6> 2 


26>d ' 
1 -d. 


-l 


with 


o^Hl - 2d) 
HI -d) 2 


29d 
1 -d 


/oH) = ro(t^)[l + 9 2 - 29p 1 (w)\ = 
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10.4.2 Estimation of Parameters 

We first briefly mention the sampling properties of the sample mean 



for estimation of the mean /./ = E[z t \ from a fractionally integrated ARMA process. From 
the general result that var[zj = (y 0 (z)/n)[ 1 + 2 X)’” 1 ) {(« — h)/n}p h (z)\ and the property 
that p h (z) ~ Ch 2d ~ l as h —* oo, it follows that 

n 1_2d var[z] C* 

for —0.5 < d < 0.5, where C* > 0 is a certain constant. Hence, we see that var[z] ~ 
C* /n l ~ 2d , whereas for short memory processes (d = 0), the variance of the sample mean 
behaves like var[z] ~ C*/n. Thus, for 0 < d < 0.5, the process mean p can be much less 
accurately estimated by the sample mean. Equivalently, a much longer series length n is 
required for accurate estimation of p for long memory processes. Hosking (1996) derived 
asymptotic distribution results for sample autocorrelations /5/(z) of long memory processes. 

Estimation of the parameters d , 0 , 0 , and a 2 in a fractionally integrated AR1MA (p, d, q ) 
process can be performed by maximum likelihood (e.g., Sowell, 1992). However, direct 
evaluation of the exact likelihood function is rather slow due partly to the complicated 
nature of the autocovariance function of the process. Therefore, approximate ML estimation 
methods have been considered by Beran (1994, 1995) and others. Another convenient 
approach is to obtain an estimate of the parameter d initially by certain methods (e.g., using 
a frequency-domain nonparametric approach; see Geweke and Porter-Hudak (1983)), and 
then estimate $, 0 , and a 2 by relatively standard ML methods for the given estimate of 
d. Asymptotic normality and the form of limiting covariance matrix of (approximate) ML 
estimators have been established by Beran (1995) and argued by Li and McLeod (1986). 
Notice that for d > 0.5, the fractionally integrated ARMA process is nonstationary. For 
such cases, in practice the typical procedure is to first difference the nonstationary process 
in the usual way, thus reducing it to a fractionally integrated process with a parameter d in 
the “stationary” range —0.5 < d < 0.5. 

One approximate maximum likelihood estimation method is suggested by expressing 
the general fractional AR1MA process z, in (10.4.2) in the infinite AR form as 

OO 

z t~Tj**j z t-j = a t (10.4.10) 

j =1 


where 


n*(B) = 1 - Yj - B) d 

7=1 

The it* coefficients can be obtained recursively based on the relation 0{B)jt*(B) = 
cj>(B)( 1 — B) d = cp(B ), similar to Section 4.2.3, as 

— W- q = <pj j = i’ 2 — 
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where (p(B) = <p(B)( 1 — B) d = 1 — ( Pj ■ Forexample, in an ARIMA( 1, d, l)model, 
the k* satisfy k* — = cp with q>. = k■ — for j > 1, where the tt. are the 

J J J l J J J J J 

coefficients in (10.4.3) and (10.4.4). In the approximate maximum likelihood or least- 
squares method, the truncated errors 


i -1 

e t (p) = z, - £ K*z t _j t = l,... ,n (10.4.11) 

j =1 

are considered as a function of (S = (<p\0\d)', and the estimate fl is determined by 
minimizing the sum of squares S(P) = Y.','=\ £ ^(P)- The corresponding approximate ML 
estimate of a^ is then taken as 6^ = S(fi)/n. For very long time series, it might be advisable 
to discard the first several £^(/J) terms in the sum-of-squares function to be minimized (e.g., 
the first 10-20 values), to avoid the effects of the inaccuracy in the approximation (10.4.11) 
for small values of t. 

For practical implementation of the approximate maximum likelihood method, we might 
consider the following modification suggested because the series (1 — B) d z t follows the 
ARMA(p, q) model. Construct the series of truncated values of (1 — B) d z, = ji(B)z, as 

t -1 

z,(d) = z t + ^ jCj(d)z,_j t = 1 ,.... n 
7=1 

for each d a grid of values within —0.5 < d < 0.5, where the Kj(d) are the coefficients in 
(10.4.3) and (10.4.4). Then for each (fixed) value of d on the grid, obtain ML estimates 
of the ARMA parameters 0, 6 , and <r^, for the time series z t (d ), ..., z. n (d), by the usual 
likelihood and sum-of-squares methods of Chapter 7. The estimate d is then taken as the 
value of d that gives the minimum or the maximum of the likelihood, and the estimates 
(j), 6 associated with this value of d are the corresponding approximate ML estimates. 

Estimation procedures directly extend to the more practical case of the fractional ARIMA 
model with an unknown nonzero mean /./, 

<KB)( 1 - B) d (z t - p) = 0(B)a, 

Although asymptotic theory is established to show that estimation of the additional un¬ 
known mean parameter p does not affect the limiting distribution of the ARIMA parameter 
estimates (j), 6, d, empirical simulation evidence (e.g., Hauser, 1999; Cheang and Reinsel, 
2003) suggests that sampling properties of these estimates can be adversely affected even 
for moderately large sample lengths. This behavior may be related to previous discussion 
concerning the lower accuracy in estimation of the mean p of a fractional ARIMA pro¬ 
cess. A possible remedy to obtain improved estimates of the ARIMA model parameters in 
the case of an unknown mean p, or in situations of more general regression models with 
fractional ARIMA noise, is use of the restricted maximum likelihood estimation method 
as discussed in Section 9.5.2. 

Forecasting. As with parameter estimation, forecasting for fractionally integrated ARMA 
processes (10.4.2) is not as convenient as for ARIMA processes with nonnegative inte¬ 
ger value of d. because of the higher complexity of the differencing operator (1 — B) d 
in the fractional case. Unlike the standard ARIMA model, forecasts cannot be obtained 
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conveniently directly from a finite-order difference equation form. For the fractional 
ARIMA model, it is simpler to consider forecasts based on the infinite AR form (10.4.10). 
Then, similar to (5.3.5) and (5.3.6), from this form we obtain that the /-step-ahead forecast 
of z t+ i based on the infinite past observations through origin t, z t , z r _j,..., is 

00 

m = Yj fZtd - j) (10.4.12) 

j =1 

where z t (l — j) = z t+ i_j for j > l as usual. For practical use, with forecasts based on a finite 
series of n available observations Zj,..., z n and n sufficiently large, the sum in (10.4.12) 
must be truncated as z„(/) = n *jZ n (l — j). 

Conversely, the process z t has the infinite MA form 

OO 

z t = y/(B)a t = Yj Wjdt-j 
i =o 

where yr(B) = Y.°°_ 0 y/jB-i = </> _1 (R)( 1 — B)~ d 0(B) = </9 _1 (B)d(B). From the same reason¬ 
ing as in Chapter 5, we also have the equivalent representation of the lead-/ forecast in 
(10.4.12) as 


00 

m = Yj Vj a , +l -j ( 10 . 4 . 13 ) 

j=l 

So the forecast error is e,(I) = z r+/ — z,(l) = Wj a t+i-j > with variance 

/-l 

a 2 (l) = var[e r (/)] = a 2 ^ y/j 
j =o 

Example: Series A. Consider again Series A, which is a time series of chemical process 
concentration readings with n = 197 observations. Two possible models were proposed for 
this series in Chapters 6 and 7. One was the “nearly nonstationary” ARMA(1,1) model, 
(1 — (j)B)z. t = 9 q + (1 — 0B)a t , with estimates </> = 0.92, 0 = 0.58, 0 Q = 1.45, and a 2 a = 
0.0974. The second was the nonstationary IMA(0,1,1) model, (1 — B)z, = (1 — 0B)a t , 
with estimates 0 = 0.71 and or = 0.1004. The unit root test performed in Section 10.1 
suggests that the nonstationary IMA(0, 1, 1) model may be more appropriate. Beran (1995) 
also examined these data and found that an ARIMA(0, d, 0) model, that is, a fractionally 
integrated white noise model, (1 — B) d (z, — p) = a t , fits the series well, with estimates 
d = 0.41 and d~ = 0.0978. Notice that the estimate of d is less than, but close to, the 
nonstationary boundary of d < 0.5 for an ARIMA(0, d, 0) process, giving further support 
to the notion that it is very difficult to determine whether this process is stationary or not 
based on the series length of only n = 197 observations. In certain respects, especially in 
terms of long memory characteristics, the fractional ARIMA(0, d, 0) model of Beran (1995) 
may be viewed as intermediate between the two models suggested earlier. For comparison, 
in Table 10.1 we display the first 30 1 //■ • coefficients of the ‘ ‘infinite’ ’ MA representation for 
each of the three models considered. Notice that while the y/j, for > 2, are initially smaller 
for the ARIMA(0, d, 0) model than for the ARMA(1,1) model, they decay relatively more 
slowly and become larger than those of the ARMA(1,1) for all lags j > 18. In contrast, 
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TABLE 10.1 Coefficients i//. of the “Infinite” MA Representations for Three ARIMA Models 
Fitted to the Chemical Process Concentration Readings in Series A. 


j 

ARMA 
(1. 1) 

IMA 
( 0 , 1 , 1) 

ARMA 

(0,rf,0) 

j 

ARMA 
(1. 1) 

IMA 
( 0 , 1 , 1) 

ARMA 
(0, d, 0) 

1 

0.34000 

0.290 

0.41000 

16 

0.09734 

0.290 

0.08938 

2 

0.31280 

0.290 

0.28905 

17 

0.08955 

0.290 

0.08628 

3 

0.28778 

0.290 

0.23220 

18 

0.08239 

0.290 

0.08345 

4 

0.26475 

0.290 

0.19795 

19 

0.07580 

0.290 

0.08086 

5 

0.24357 

0.290 

0.17460 

20 

0.06974 

0.290 

0.07848 

6 

0.22409 

0.290 

0.15743 

21 

0.06416 

0.290 

0.07627 

7 

0.20616 

0.290 

0.14416 

22 

0.05902 

0.290 

0.07423 

8 

0.18967 

0.290 

0.13353 

23 

0.05430 

0.290 

0.07232 

9 

0.17449 

0.290 

0.12477 

24 

0.04996 

0.290 

0.07054 

10 

0.16054 

0.290 

0.11741 

25 

0.04596 

0.290 

0.06888 

11 

0.14769 

0.290 

0.11111 

26 

0.04228 

0.290 

0.06732 

12 

0.13588 

0.290 

0.10565 

27 

0.03890 

0.290 

0.06585 

13 

0.12501 

0.290 

0.10086 

28 

0.03579 

0.290 

0.06446 

14 

0.11501 

0.290 

0.09661 

29 

0.03293 

0.290 

0.06315 

15 

0.10581 

0.290 

0.09281 

30 

0.03029 

0.290 

0.06190 


for the IMA(0,1,1) model we know that the y/j = 1 — 6, for all j > 1, do not decay, which 
may not be an appropriate feature of a model for this process. 

Remark. The parameters of the ARIMA(0, d, 0) model can be estimated using the fracdiff 
package in R as shown below. From the partial output included, we see that the estimates 
d = 0.40 and = (0.3123734) 2 = 0.0976 are close to the values quoted above. 

> library(fracdiff) 

> fracdiff(seriesA, nar=0, nma=0, M=30) 

Call: fracdiff (x = nuraA, nar=0, nma=0, M=30) 

Coefficients: d = 0.4001903 

sigmafeps] = 0.3123734 


EXERCISES 

10.1 Download from the Internet the daily stock prices of a company of your choosing. 

(a) Plot the data using the graphics capabilities in R. Are there any unusual features 
worth noting? Perform a statistical test to determine the presence of a unit root 
in the series. 

(b) Compute and plot the series of daily log returns. Does the graph show evidence 
of volatility clustering? Perform a statistical analysis to determine whether an 
AR-ARCFI model would be appropriate for your series. If so, fit the model to 
the returns. 
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10.2 Daily closing prices of four major European stock indices are available for the 
period 1991-1998 in the file “EuStockMarkets” in the R datasets package; see 
help(EuStockMarkets) for details. 

(a) Select two series and plot the data using R. Are there any unusual features worth 
noting? Perform a statistical test to determine the presence of a unit root in these 
series. 

(b) Compute and plot the series of daily log returns. Do the graphs show evi¬ 
dence of volatility clustering? Perform a statistical analysis to determine whether 
AR-ARCH models would be appropriate for your series. State the final models 
selected. 


10.3 Consider the ARCH(l) process {a,} defined by a t = a t e t , with rr~ = a 0 4- a, cr_ v 
where the e, are independent, identically distributed variates with mean 0 and vari¬ 
ance 1 , and assume that 0 < aq < 1 . 


(a) Verify 
or 


that 


V-IOO 

“0 21=0 


j 2 2 

< e i e U 


t-j 


e;a Q I 


i + £“1 a ^ e2 


l *-i 


’l.) 


a, = e, 



provides a causal (strictly) stationary representation (solution) of the ARCH 
model equations, that is, such that er 2 = a 0 M + ^7=1 a i e 7t sat ' s ^' es 

a] = a Q + r/|7_| = «o + 

(b) Use the representation for a, in (a) to show that E[a t \ = 0, E[aj] = var[a r ] = 
a 0 /(l — aq), and E[a t a t _ k \ = cov[o r , a t _ k ] = 0 for k # 0. 

(c) Define X t = ctf and assume, in addition, that aj < |, so that E[a^] < oo, 
that is, E[X~] < oo. Show that the process {X t } satisfies the relation X t = 
e^(a Q + a 1 3f,_ 1 ), and deduce from this that the autocovariances of { X ,} satisfy 
cov[3f f , X t _ k ] = ajCovtV^j, X t _ k ] fork > 1. Hence, conclude that {X t } has the 
same autocorrelation function as an AR(1) process with AR parameter (j) = «,. 


10.4 Consider the GARCH(1, 1) model a, = n,e v where the e t are iid random variables 

with mean 0 and variance 1, and aj = a Q + + /3j cr“_ j. Show that the uncon¬ 

ditional variance of a t equals var[cr r ] = a 0 /[l — (aq + /?[)]. 

10.5 Derive the five-step-ahead forecast of the conditional variance from a time origin 
h for the GARCH(1,1) process. Repeat the derivation for a GARCH(2,1) process. 

10.6 Suppose that a time series of stock returns {r,} can be represented using an 
ARCH(1)-M process r, = 5cr f 2 + a t , a t = a t e t , and rrj = a 0 + oqa 2 ^, where the e t 
are iid Normal(0, 1). 

(a) Derive the conditional and unconditional mean of the series. 

(b) Show that the ARCH-in-mean effect makes the {r t } serially correlated and 
calculate the ACF p k ,k = 1,2,.... 

10.7 Assume that { z, } is a stationary, zero mean, Gaussian process with autocovariance 
function y k (z) and autocorrelation function p k (z). Use the property that for zero 
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mean Gaussian variates, 

E[z t z t+i z t+J z t+k \ = E[z t z t+i ]E[z t+J z t+k ] + E[z t z t+j ]E[z t+i z t+k ] 
+E[z t z t+k ] E[z t+i z t+ j] 

to show that cov[zJ% ~ 2 +k \ = 2/^(z) and hence that the autocorrelation function of 
the process of squared values X, = z 2 is p k (X) = p 2 k (z). 

10.8 Consider the first-order bilinear model z t = (j>z t _, + a t — bz t _^a t _^, where the a t 
are independent variates with mean 0 and variance a 2 . Assume the process {z ,} is 
stationary, which involves the condition that <fi 2 + a 2 b 2 < 1, and assume that {z t } 
has a causal stationary representation of the form z, = a, + f (a t _ 1; a t _ 2 ,...). 

(a) Verify that E[z t a t ] = er 2 , and so also that p. = E[z t ] satisfies (1 — (f>)p : , = 
-ba 2 . 

a 

(b) Establish that the autocovariances y k of { z t } satisfy y k = 4>y k _ \ for k > 1, so that 
the process has the same autocovariance structure as an ARMA(1,1) process. 

10.9 Consider the annual sunspot series referred to as Series E in this text. The series is 
also available for a slightly longer time period as series “sunspot.year” in the R 
datasets package, 

(a) Plot the time series and fit an AR(3) model to the series. 

(b) Use the procedure described by McLeod and Li (1983) to test for nonlinearity 
in the series. 

(c) Repeat part (b) using the Keenan and Tsay tests for nonlinearity. 

(d) Describe how you might fit a nonlinear time series model to this series. 

10.10 Measurements of the annual flow of the river Nile at Aswan from 1871 to 1970 are 
provided as series “Nile” in the R datasets package; type help(Nile) for details. 

(a) Plot the data along with the ACF and PACF of the series. Fit an appropriate 
ARIMA model to this series and comment. 

(b) Perform a statistical analysis to determine whether there is evidence of long 
memory dependence in this series. 

(c) If the answer in (b) is affirmative, develop a fractionallly integrated ARMA 
(i.e, ARFIMA) model for the series. 



PART THREE 


TRANSFER FUNCTION AND 
MULTIVARIATE MODEL BUILDING 


Suppose that X measures the level of an input to a dynamic system. For example, X might 
be the concentration of some constituent in the feed to a chemical process. Suppose that 
the level of X influences the level of a system output Y . For example, Y might be the 
yield of product from the chemical process. It will usually be the case that because of the 
inertia of the system, a change in X from one level to another will have no immediate effect 
on the output but, instead, will produce delayed response with Y eventually coming to 
equilibrium at a new level. We refer to such a change as a dynamic response. A model that 
describes this dynamic response is called a transfer function model. We shall suppose that 
observations of input and output are made at equispaced intervals of time. The associated 
transfer function model will then be called a discrete transfer function model. 

Models of this kind can describe not only the behavior of industrial processes but also that 
of economic and business systems. Transfer function model building is important because 
it is only when the dynamic characteristics of a system are understood that intelligent 
direction, manipulation, and control of the system is possible. 

Even under carefully controlled conditions, influences other than X will affect Y . We 
refer to the combined effect on Y of such influences as the disturbance or the noise. Such 
model that can be related to real data must take account of not only the dynamic relationship 
associating X and Y but also the noise infecting the system. Such joint models are obtained 
by combining a deterministic transfer function model with a stochastic noise model. 

In Chapter 11 we introduce a class of linear transfer function models capable of rep¬ 
resenting many of the dynamic relationships commonly met in practice. In Chapter 12 
we show how, taking account of corrupting noise, they may be related to data. Given the 
observed series X and Y, the development of the combined transfer function and noise 
model is accomplished by procedures of identification, estimation, and diagnostic checking, 
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which closely parallel those already described for univariate time series. In Chapter 13 we 
describe how simple pulse and step indicator variables can be used as inputs in transfer 
function models to represent and assess the effects of unusual intervention events on the 
behavior of a time series Y. In Chapter 14 the concepts and methods of bivariate time series 
analysis and transfer function modeling are extended to the general study of dynamic rela¬ 
tionships among several time series through development of statistical models and methods 
of multivariate time series analysis. 



11 


TRANSFER FUNCTION MODELS 


In this chapter, we introduce a class of discrete linear transfer function models. These 
models take advantage of the dynamic relationship between two time series for prediction, 
control, and other applications. The models considered can be used to represent commonly 
occurring dynamic situations and are parsimonious in their use of parameters. 


11.1 LINEAR TRANSFER FUNCTION MODELS 

We assume that pairs of observations (X v Y t ) are available at equispaced intervals of time 
of an input X and an output Y from some dynamic system, as illustrated in Figure 11.1. In 
some situations, both X and Y are essentially continuous but are observed only at discrete 
times. It then makes sense to consider not only what the data has to tell us about the model 
representing transfer from one discrete series to another, but also what the discrete model 
might be able to tell us about the corresponding continuous model. In other examples, 
the discrete series are all that exist, and there is no underlying continuous process. Where 
we relate continuous and discrete systems, we shall use the basic sampling interval as 
the unit of time. That is, periods of time will be measured by the number of sampling 
intervals they occupy. Also, a discrete observation X t will be deemed to have occurred 
“at time f.” 

When we consider the value of a continuous variable, say Y at time f, we denote it by 
Y ( t ). If i happens to be a time at which a discrete variable Y is observed, its value is denoted 
by Y t . When we wish to emphasize the dependence of a discrete output Y, not only on time 
but also on the level of the input X , we write Y t (X). 
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FIGURE 11.1 Input to. and output from, a dynamic system. 


11.1.1 Discrete Transfer Function 

With suitable inputs and outputs, which are left to the imagination of the reader, the dynamic 
system of Figure 11.1 might represent an industrial process, the economy of a country, or 
the behavior of a particular corporation or government agency. 

From time to time, we refer to the steady-state level of the output obtained when the 
input is held at some fixed value. By this, we mean that the value Y^iX) at which the 
discrete output from a stable system eventually comes to equilibrium when the input is held 
at the fixed level X. Very often, over the range of interest, the relationship between V O0 (A') 
and X will be approximately linear. Hence, if we use Y and X to denote deviations from 
convenient origins situated on the line, we can write the steady-state relationship as 

V'oo = g* (111-1) 

where g is called the steady-state gain, and it is understood that is a function of X. 

Now, suppose the level of the input is being varied and that X t and Y t represent deviations 
at time t from equilibrium. Then, it frequently happens that to an adequate approximation, 
the inertia of the system can be represented by a linear filter of the form 

Y t = v 0 X t + ViX,_ l + v 2 X t _ 2 + ■■■ 

= (uq + tq B + v 2 B~ + •••)X t 

= v(B)X, (11.1.2) 

in which the output deviation at some time t is represented as a linear aggregate of input 
deviations at times t,t — 1,.... The operator v(B) is called the transfer function of the filter. 

Impulse Response Function. The weights v 0 , Vj , v 2 , ■■■ in (I 1. 1.2) are called the impulse 
response function of the system. This is because the Vj may be regarded as the output or 
response at times j > 0 to a unit pulse input at time 0, that is, an input X t such that X t = 1 
if t = 0, X t = 0 otherwise. The impulse response function is shown in Figure 11.1 in the 
form of a bar chart. When there is no immediate response, one or more of the initial u’s, 
say vq,Vj, , v b _ t , will be equal to zero. 

According to (11.1.2), the output deviation can be regarded as a linear aggregate of a 
series of superimposed impulse response functions scaled by the deviations X t . This is 
illustrated in Figure 11.2, which shows a hypothetical impulse response function and the 
transfer it induces from the input to the output. In the situation illustrated, the input and 
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FIGURE 11.2 Linear transfer from input X t to output Y t . 


output are initially in equilibrium. The deviations that occur in the input at times t = 1, 
i = 2, and t = 3 produce impulse response patterns of deviations in the output, which add 
together to produce the overall output response. 


Relation Between the Incremental Changes. Denote by 

y, = Y t - F r _! = WY t 


and by 


x, = X,-X,_ [ = VX, 

the incremental changes in Y and X. We often wish to relate such changes. On differencing 
(11.1.2), we obtain 


y, = v(B)x, 

Thus, we see that the incremental changes y r and x t satisfy the same transfer function model 
as do Y t and X t . 

Stability. If theinfinite series v Q + U\B + v 2 B 2 + ■■■ converges for | B\ < 1, or equivalently, 
if the Vj are absolutely summable, so that I v j I < 00 > then the system is said to be 

stable. We shall be concerned here only with stable systems and consequently, impose this 
condition on the models we study. The stability condition implies that a finite incremental 
change in the input results in a finite incremental change in the output. 
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Now, suppose that X is held indefinitely at the value +1. Then, according to (11.1.1), Y 
will adjust and maintain itself at the value g. On substituting in (11.1.2) the values Y t = g, 
1 = X t = X t _ i = X ,_2 = .... we obtain 

00 

Yj v i = S (U- 1 - 3 ) 

j =o 

Thus, for a stable system the sum of the impulse response weights converges and is equal 
to the steady-state gain of the system. 

Parsimony. It would often be unsatisfactory to parameterize the system in terms of the 
Dj’s of (11.1.2). The use of that many parameters could, at the estimation stage, lead 
to inaccurate and unstable estimation of the transfer function. Furthermore, it is usually 
inappropriate to estimate the weights Vj directly because for many real situations the Vj’s 
would be functionally related, as we now see. 

11.1.2 Continuous Dynamic Models Represented by Differential Equations 

First-Order Dynamic System. Consider Figure 11.3. Suppose that at time t,X(t) is the 
volume of water in tank A and Tj(t) the volume of water in tank B, which is connected to A 
by a pipe. For the time being we ignore tank C, shown by dashed lines. Now suppose that 
water can be forced in or out of A through pipe P and that mechanical devices are available 
that make it possible to force the level and hence the volume X in A to follow any desired 
pattern irrespective of what happens in B. 

Now if the volume X in the first tank is held at some fixed level, water will flow from 
one tank to the other until the levels are equal. If we now reset the volume X to some 
other value, again a flow between the tanks will occur until equilibrium is reached. The 
volume in B at equilibrium as a function of the fixed volume in A yields the steady-state 
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FIGURE 11.3 Representation of a simple dynamic system. 
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relationship 


Y\*>=giX (11.1.4) 

In this case the steady-state gain gj physically represents the ratio of the cross-sectional 
areas of the two tanks. If the levels are not in equilibrium at some time t, it is to be noted 
that the difference in the water level between the tanks is proportional to g^Xf) — Yft). 

Suppose now that by forcing liquid in and out of pipe P, the volume X(t ) is made to 
follow a pattern like that labeled ‘ ‘Input X(t)” in Figure 11.3. Then, the volume Fj (t) in B 
will correspondingly change in some pattern such as that labeled on the figure as ‘ ‘Output 
Fj(f).” In general, the function X(t) that is responsible for driving the system is called the 
forcing function. 

To relate output to input, we note that to a close approximation, the rate of flow through 
the pipe will be proportional to the difference in head. That is, 

?Il = 7 L[ gl X(t)-Y l (t)] (11.1.5) 

dt T { 

where 7j is a constant. The differential equation (11.1.5) may be rewritten in the form 

(l + T l D)Y i (t) = g l X(t) (11.1.6) 

where D = d/dt. The dynamic system so represented by a first-order differential equation is 
often referred to as a first-order dynamic system. The constant 7) is called the time constant 
of the system. The same first-order model can approximately represent the behavior of many 
simple systems. For example, Yft) might be the outlet temperature of water from a water 
heater, and X(t) the flow rate of water into the heater. 

It is possible to show (see, e.g., Jenkins and Watts, 1968) that the solution of a linear 
differential equation such as (11.1.6) can be written in the form 

rOO 

Fj(i) = / v(u)X(t — u)du (11.1.7) 

Jo 

where in general v(u) is the (continuous) impulse response function. We see that V) (t) is 
generated from X(t) as a continuously weighted aggregate, just as Y t is generated from X t 
as a discretely weighted aggregate in (11.1.2). Furthermore, we see that the role of weight 
function played by v(u) in the continuous case is precisely parallel to that played by Vj in 
the discrete situation. For the particular first-order system defined by (11.1.6), 

v(u) = g 1 Tf 1 e~ u/T ' 

Thus, the impulse response in this case undergoes simple exponential decay, as indicated 
in Figure 11.3. 

In the continuous case, determination of the output for a completely arbitrary forcing 
function, such as shown in Figure 11.3, is normally accomplished by simulation on an 
analog computer, or by using numerical procedures on a digital machine. Solutions are 
available analytically only for special forcing functions. Suppose, for example, that with 
the hydraulic system empty, X(t) was suddenly raised to a level X(t) = 1 and maintained 
at that value. Then, we shall refer to the forcing function, which was at a steady level of 
zero and changed instantaneously to a steady level of unity, as a (unit) step function. The 
response of the system to such a function, called the step response to the system, is derived 
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FIGURE 11.4 Response of a first-order system to a unit step change. 


by solving the differential equation (11.1.6) with a unit step input, to obtain 

ri(0 = gt( l-e- J/T 0 (11.1.8) 

Thus, the level in tank B rises exponentially in the manner shown in Figure 11.4. Now, 
when t = 7’|, Y\(t) = g | (1 — e -1 ) = 0.632gj. Thus, the time constant 7) is the time required 
after the initiation of a step input for the first-order system (11.1.6) to reach 63.2% of its 
final equilibrium level. 

Sometimes there is an initial period of pure delay or dead time before the response to 
a given input change begins to take effect. For example, if there were a long length of 
pipe between A and B in Figure 11.3, a sudden change in level in A could not begin to 
take effect until liquid had flowed down the pipe. Suppose that the delay thus introduced 
occupies r units of time. Then, the response of the delayed system would be represented by 
a differential equation like (11.1.6), but with t — r replacing t on the right-hand side, so that 

(l + T l D)Y l (t) = gl X(t - t) (11.1.9) 

The corresponding impulse and step response functions for this system would be of 
precisely the same shape as for the undelayed system, but the functions would be translated 
along the horizontal axis a distance r. 

Second-Order Dynamic System. Consider Figure 11.3 once more. Imagine a three-tank 
system in which a pipe leads from tank B to a third tank C, the volume of liquid in which is 
denoted by Y 2 (t). Let 73 be the time constant for the additional system and g 0 its steady-state 
gain. Then, Y 2 (t) and Y\ (?) are related by the differential equation 

(1 + T 2 D)Y 2 (t) = g 2 Y t (t) 

After substitution in (11.1.6), we obtain a second-order differential equation linking the 
output from the third tank and the input to the first: 

[1 + (7) + T 2 )D + T\T 2 D 2 \Y 2 (t) = gX(t ) (11.1.10) 

where g = g\g 2 . For such a system, the impulse response function is a mixture of two 
exponentials 

g( e -“/7i _ e ~u/T 2 ) 

v(u) = 


T \-T 2 


( 11 . 1 . 11 ) 
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and the response to a unit step is given by 


Y 2 (t) = g 



T l e - , / T i 


T, -To 


( 11 . 1 . 12 ) 


The continuous curve R in Figure 11.5 shows the response to a unit step for the system 
(1+3 D + 2 D 2 )Y 2 (t) = 5 XV) 


for which 7) = 1, T 2 = 2, g = 5. Note that unlike the first-order system, the second-order 
system has a step response that has zero slope initially. 

A more general second-order system is defined by 


( \+E l D + E 2 D 2 )Y(t) = gX(t) 


where 

S, = T, + T 2 E 2 = T l T 2 
and the constants 7) and T 2 may be complex. If we write 


... 1 

T i = — e 

1 c 


,/A 


To = -e 


t —iA 


then (11.1.13)becomes 


; + 2cos2 D+ 1 ^2 ) Y( t) = gX( t) 
S (A 


The impulse response function (11.1.11) then reduces to 

£ e -?ucosA s j n (£ w s j n ^ 


v(u) = g- 


sin X 


(11.1.13) 

(11.1.14) 

(11.1.15) 

(11.1.16) 

(11.1.17) 



FIGURE 11.5 Step responses of coincident, discrete, and continuous second-order systems having 
characteristic equations with real roots (curve R) and complex roots (curve C). 
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and the response (11.1.12) to a unit step, to 


Y(t) = g 



e ? ,COS/l sin(£f sin A + A) 
sin X 


(11.1.18) 


The continuous curve C in Figure 11.5 shows the response to a unit step for the system 
(1 + \flD + 2 D 2 )Y(t) = 5X(t) 

for which A = k/3 and C = \fx/2. It will be noticed that the response overshoots the value 
g = 5 and then comes to equilibrium as a damped sine wave. This behavior is typical of 
underdamped systems, as they are called. In general, a second-order system is said to be 
overdamped , critically damped, or underdamped, depending on whether the constants T j 
and T 2 are real, real and equal, or complex. The overdamped system has a step response 
that is a mixture of two exponentials, given by (11.1.12), and will always remain below the 
asymptote Y(oo) = g. As with the first-order system, the response can be made subject to a 
period of dead time by replacing t on the right-hand side of (11.1.13) by t — r. Many quite 
complicated dynamic systems can be closely approximated by such second-order systems 
with delay. 

More elaborate linear dynamic systems can be represented by allowing not only the level 
of the forcing function X(t) but also its rate of change dX/dt and higher derivatives to 
influence the behavior of the system. Thus, a general model for representing (continuous) 
dynamic systems is the linear differential equation 

(1 + HjD + - + H R D R )Y(t) = g(l + H X D + - + H s D s )X(t - r) (11.1.19) 


11.2 DISCRETE DYNAMIC MODELS REPRESENTED BY DIFFERENCE 
EQUATIONS 

11.2.1 General Form of the Difference Equation 

Corresponding to the continuous representation (11.1.19), discrete dynamic systems are 
often parsimoniously represented by the general linear difference equation 

(i + Si v + - + 4 XW t = g(i + fir v + - + nX)x t _ b (11.2.1) 

which we refer to as a transfer function model of order (r, s). The difference equation 
(11.2.1) may also be written in terms of the backward shift operator B, with V = 1 — B, as 

(1 — S 1 B - 8,.B r )Y, = (® 0 - - co s B s )X t _b (11.2.2) 


or as 


8{B)Y t = co(B)X t _ b 

Equivalently, writing Q.(B) = co(B)B b , the model becomes 


S(B)Y t = Q(B)X r 


( 11 . 2 . 3 ) 
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Comparing (11.2.3) with (11.1.2) we see that the transfer function for this model is 

v(B) = 8~ l (B)Q(B) (11.2.4) 

Thus, the transfer function is represented by the ratio of two polynomial operators in B. 

Dynamics of ARIMA Stochastic Models. The ARIMA model 

(p(B)z, = 9(B)a t 

used for the representation of a time series { z, } relates z t and a, by the linear filtering 
operation 


z t = cp l (B)9(B)a t 

where a, is white noise. Thus, the ARIMA model postulates that a time series can be 
usefully represented as an output from a dynamic system to which the input is white noise 
and for which the transfer function can be parsimoniously expressed as the ratio of two 
polynomial operators in B. 

Stability of the Discrete Models. The requirement of stability for the discrete transfer 
function models exactly parallels that of stationarity for the ARM A stochastic models. In 
general, for stability we require that the roots of the characteristic equation 

8(B) = 0 

with B regarded as a variable, lie outside the unit circle. In particular, this implies that for 
the first-order model with 8(B) = 1 — 8 X B, the parameter <5j satisfies 

-1 < 8 X < 1 

and for the second-order model (see, e.g., Fig. 11.5), the parameters <5j, S 2 satisfy 

8 2 T" 8 ^ < 1 

^2 _ ^1 <1 
-1 < <5 2 < 1 


On writing (11.2.2) in full as 


Y t - 8\ T ( _i + ••• + 8 r Y t _ r + co Q X t _ b - w l X t _ b _ 1 - ■■■ - w s X t _ b _ s 


we see that if X t is held indefinitely at a value +1, Y t will eventually reach the value 


COq CO j ••• co s 

1 - 6 '!- 8 ,. 


(11.2.5) 


which expresses the steady-state gain in terms of the parameters of the model. 
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11.2.2 Nature of the Transfer Function 

If we employ a transfer function model defined by the difference equation (11.2.2). then 
substituting 

Y t = v(B)X, (11.2.6) 

in (11.2.2), we obtain the identity 

(1 - 8 { B - 8 2 B 2 - 8,.B r )(v o + v x B + v 2 B 2 + •••) 

= (® 0 -co,£- co s B s )B b (11.2.7) 

On equating coefficients of B. we find 
0 

+ S 2 v j-2 + •" + S r v j-r + *0 

Vj = 

+ f>2 v j-2 + ■" + &r v j-r ~ w j-b 
v j-l + $2 v j-2 + •" + &r v j-r 

The weights v b+s , o A+s _j,..., v b+s _ r+l supply r starting values for the homogeneous dif¬ 
ference equation 


j<b 
j = b 

j = b + \,b + 2,... ,b + s 
j > b + s 


( 11 . 2 . 8 ) 


8(B)vj =0 j > b + s 

The solution v j = f(5,co,j) of this difference equation applies to all values Vj for which 
j >b + s — r+ 1. 

Thus, in general, the impulse response weights consist of: 

1. b zero values v 0 , tq,..., v b _ l . 

2. A further s — r + 1 values v b , v b+ls ..., v b+s _ r following no fixed pattern (no such 
values occur if s < r). 

3. Values Vj with j > b + s — r + 1 following the pattern dictated by the 
rth-order difference equation, which has r starting values v b+s , v b+s _i ,..., 
v b+s _ r+l . Starting values Vj for j < b will, of course, be zero. 

Step Response. We now write V(B) for the generating function of the step response 
weights Vj. which represent the response at times j > 0 to a unit step at time 0, X t = 1 if 
t > 0, X, = 0 if t < 0, so that Vj = v, for j > 0. Thus, 


V(B) = V {) + V X B + V 2 B 2 + ... 

= Vq + (t>Q + v x )B + (dq + Dj + v 2 )B 2 + ••• (11.2.9) 


and 


v(B) = (1 - B)V(B) 


(11.2.10) 
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Substitution of (11.2.10) in (11.2.7) yields the identity 

(1 - 8\B - 8* 2 B 2 - 8* r+l B r+1 )(V 0 + V\B + V 2 B 2 + •••) 

= (co 0 -co 1 B - co s B s )B b (11.2.11) 

with 

(1 - 8\B - 8*B 2 - s * +l B r+l ) = (1 - B)( 1 - 8 X B - 8 r B r ) (11.2.12) 

The identity (11.2.11) for the step response weights Vj exactly parallels the identity (11.2.7) 
for the impulse response weights, except that the left-hand operator 8* (B) is of order r -I- 1 
instead of r. 

Using the results (11.2.8), it follows that the step response function is defined by: 

1. b zero values V 0 , V l , ... , F 6 _j. 

2. A further s — r values V b , V b+l , ..., V b+S _ r _ l following no fixed pattern (no such 
values occur if s < r + 1). 

3. Values Vj, with j > b + s — r, which follow the pattern dictated by the (/• + l)th-order 
difference equation S*(B)Vj = 0, which has /• + 1 starting values V b+S , 
V b+S _i ,..., V b+S _ r . Starting values V f for j < b will, of course, be zero. 

11.2.3 First- and Second-Order Discrete Transfer Function Models 

Details of transfer function models for all combinations of r = 0,1,2 and s = 0,1,2 are 
shown in Table 11.1. Specific examples of the models, with bar charts showing step response 
and impulse response, are given in Figure 11.6. The equations at the end of Table 11.1 
allow the parameters 2, g, rj of the V form of the model to be expressed in terms of the 
parameters 6. co of the B form. These equations are given for the most general of the models 
considered, namely that for which r = 2 and .v = 2. All the other models are special cases 
of this one, and the corresponding equations for these are obtained by setting appropriate 
parameters to zero. For example, if r = 1 and s = 1, = Hi = ^2 = ®2 = 0, then 

c £i g(l+»/i) sn\ 

" 0 = TT7T " l = TTft 

In Figure 11.6, the starting values for the difference equations satisfied by the impulse and 
step responses, respectively, are indicated by circles on the bar charts. 

Discussion of the Models in Table 11.1. The models, whose properties are summarized in 
Table 11.1 and Figure 11.6, will require careful study, since they are useful in representing 
many commonly met dynamic systems. In all the models the operator B b on the right 
ensures that the first nonzero term in the impulse response function is v b . In the examples 
in Figure 11.6, the value of g is assumed to equal 1, and b is assumed to equal 3. 

Models with r = 0. With r and s both equal to zero, the impulse response consists of a single 
nonzero value o b = w Q = g. The output is proportional to the input but is displaced by b time 
intervals. More generally, if we have an operator of order s on the right, the instantaneous 
input will be delayed b intervals and will be spread over s + 1 values in proportion to v b = 
®0’ ^6+1 = — ®u > v b+s = ~ m s- The step response is obtained by summing the impulse 



TABLE 11.1 Impulse Response Functions for Transfer Function Models of the Form 8 r (B)Y, = a> s (B)B b X, 


rsb 

V Form 

B Form 

Impulse Response V. 

00 b 

Y, = gX,. b 

Y, = oi^X, 

0 

j<b 





j = b 




0 

j > b 




0 

j < b 




© 0 

j = b 

01 b 

Y, = g(. l 

Y, = (a> 0 -(o l B)B b X l 

-CO\ 

j = b+ 1 




0 

j > b + 1 




0 

J < b 




(O 0 

j = b 

02 b 

r,=s(i+»riV+«t 2 v 2 )*,_ 4 

Y, = (© 0 - ©, B - (o 2 B 2 )B b X, 

-©, 

j = b+ 1 




-© 2 

7 = 6 + 2 




0 

J > b + 2 




0 

j < b 

10 b 

<i+f,v)y, = 

(1 - & X B)Y, = co 0 B b X, 

©0 

J — b 




s.”7-. 

j > b 




0 

j < b 




©o 

j — b 

11 b 

(i +^v)r, = g(i + i/,v)Ar,_ 4 

(1 - 6 X B)Y, = (® 0 - co l B)B b X, 

ft® 0 — ©, 

7 = 6+1 




Vy_l 

7 > 6 + 1 




0 

7 < 6 




©o 

7 = 6 

12 b 

(l + ft V)T, = *(i + V, V + n 2 v 2 )*,_* 

(I - 6 X B)Y, = (co 0 -co ] B- a> 2 B 2 

ft® 0 — co x 

7 = 6+1 




— S X Q) | — ©2 

7 = 6 + 2 




ft Oy_i 

7 > 6 + 2 


( continued ) 
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TABLE 11.1 Impulse Response Functions for Transfer Function Models of the Form 8 r (B)Y, = m s (B)B b X, ( continued) 


rsb 

V Form 

B Form 

Impulse Response Vj 




0 

J < b 

20 b 


(1 - 8 y B - 8 2 B 2 )Y, = (o 0 B b X, 

(Oq 

j = b 




8 y Vj_ x + <5, Uj_ 2 

j>b 




0 

j <b 




(B 0 

j — b 

21 b 

(l+£ 1 V + ftV 2 )Y,=g(l+ ? ,V)*,_ 4 

(1 -8 y B- 8 2 B 2 )Y, = (® 0 - (o l B)B b X, 

6, ffl 0 - <B| 

j = b+ 1 




+ *2®/-2 

j > b + 1 




0 

j <b 




(Oq 

j = b 

22 b 

(i +{,7+^)1; = g(i + ,,v 

(1 -8 i B- 8 2 B 2 )Y, = (co 0 - w y B - (o 2 B 2 )B b X, 

5 x (Oq — ®, 

j = b+ 1 




(<5* + 8 2 )coQ-5 l a) y -co 2 

y = fc + 2 




+^2»>-2 

j = b + 2 


5, + 2 ^ o — 62 

, * + 2 «2 5 -ft 

1 - 8 ,- 8 ,' * 1 - 8 ,- 8 , 

1 +ft+ft' ‘ l + ft+ft 

a > 0 — t»i - co 2 

g(l+ 0 ,+/J 2 ) 

1 —5,-62 

i+li+ft 

ft), + 2®2 

S(»t 1 + 2»fe) 

« 0 - 0), - co 2 

1+ti+h 

-o> 2 

~g»2 

(Oq — CO, — ft>2 

i+ft+ft 


1—8,— «2 =(! + «, +{,)-' 
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r,s,h 


V Form 


Impulse 
Response u. 


Sk T V J„, 

Response ' i = q 


003 


Y, = X r _ 


013 


y ? = (i-o.5V)x / _3 


Y t = (0.5 + 0.55) BX t 


023 


Y,= 


(1-V+ 0.25 V )X t _ 3 


(0.25 + 0.505 + 0.255 2 ) B% 


J LL 


103 


(1+V) Y t =X t _ 3 


(1 - 0.55) Y t = 0.55 


113 


(1+V) Y t = 

(1-0.5V) X t _ 3 


(1-0.55) Y t = 

(0.25 + 0.255) B?X r 




123 


(i+v) y, = 

(1 - V+0.25 V 2 )X,_ 3 


(1 -0.5B) Y t = 

(0.125 + 0.25B + 0.125 B 2 ) B 3 X, 




203 


(1-0.25 V + 0.5 VI Y, = 


(1 - 0.6B + 0.4 B 2 ) Y, = 0.8 B X, 


1 


213 


(1-0.25 V+0.5 V 2 )r, = 
(1-0.5 V)X,_ 3 


(1 -0.65 + 0.45 2 ) Y t = 

(0.4 + 0.45) R'X 


223 


(1-0.25 V+0.5 V 2 )7, = 
(l-V+0.25 V 2 )X f _ 3 


(1 - 0.65 + 0.45 2 ) Y t = 

(0.2 + 0.45 + 0.25 2 ) 5 3 X 


Jl 


FIGURE 11.6 Examples of impulse and step response functions with gain g = 1. 

response and eventually satisfies the difference equation (1 — B)Vj = 0 with starting values 
V b+S = g = co 0 - ftij- m s . 

Models with r = 1. With s = 0, the impulse response tails off exponentially (geo¬ 
metrically) from the initial starting value v b = w 0 = g/(l + £j) = g(l — <5j). The step re- 
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sponse increases exponentially until it attains the value g = 1. If the exponential step 
response is extrapolated backwards as indicated by the dashed line, it cuts the time axis at 
time b — 1. This corresponds to the fact that V b _\ = 0 as well as V h = v b are starting values 
for the appropriate difference equation (1 — <5]_B)(1 — B)Vj = 0. 

With 5=1, there is an initial value v b = co 0 = g(l + r/j)/(l + £j) of the impulse re¬ 
sponse, which does not follow a pattern. The exponential pattern induced by the differ¬ 
ence equation Vj = Sj v J _ [ associated with the left-hand operator begins with the starting 
value » 6+1 = (<5)® 0 — cof) = g(lf — 1 + E,\) 2 . The step response function follows an 

exponential curve, determined by the difference equation (1 — <5j B)(\ — B)Vj = 0, which 
approaches g asymptotically from the starting value V b = v b and V b+1 = u b + v b+l . An 
exponential curve projected by the dashed line backwards through the points will, in gen¬ 
eral, cut the time axis at some intermediate point in the time interval. We show in Section 
11.3 that certain discrete models, which approximate continuous first-order systems having 
fractional periods of delay, may in fact be represented by a first-order difference equation 
with an operator of order s = 1 on the right. 

With 5 = 2, there are two values v b and u b+l for the impulse response that do not follow 
a pattern, followed by exponential fall off beginning with v b+2 . Correspondingly, there is a 
single preliminary value V b in the step response that does not coincide with the exponential 
curve projected by the dashed line. This curve is, as before, determined by the difference 
equation (1 - <5[.B)(1 - B)Vj = 0 but with starting values V b+ i and V b+2 - 

Models with r = 2. The flexibility of the model with s = 0 is limited because the first 
starting value of the impulse response is fixed to be zero. More useful models are obtained 
for 5=1 and 5 = 2. The use of these models in approximating continuous second-order 
systems is discussed in Section 11.3 and in Appendix A11.1. 

The behavior of the dynamic weights Vj, which eventually satisfy 

Vj — — S 2 Vj_2 = 0 j > b + 5 (11.2.13) 

depends on the nature of the roots A" 1 and Sf 1 , of the characteristic equation 
1 - 8 X B - S 2 B 2 = (1 - Sj5)(l - S 2 B ) = 0 

This dependence is shown in Table 11.2. As in the continuous case, the model may be 
overdamped, critically damped, or underdamped, depending on the nature of the roots of 
the characteristic equation. 

When the roots are complex, the solution of (11.2.13) will follow a damped sine wave, 
as in the examples of second-order systems in Figure 11.6. When the roots are real, the 
solution will be the sum of two exponentials. As in the continuous case considered in 


TABLE 11.2 Dependence of Nature of Second-Order System on the Roots of 1 — 5 1 B — 
8 2 B 2 =0 


Roots (A” 1 , A” 1 ) 

Condition 

Damping 

Real 

Sj +4S 2 >0 

Overdamped 

Real and equal 

Sj +4S 2 = 0 

Critically damped 

Complex 

Sf +4S 2 <0 

Underdamped 
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Section 11.1.2, the system can then be thought of as equivalent to two discrete first-order 
systems arranged in series and having parameters A) and S 2 . 

The weights Vj for the step response eventually satisfy a difference equation 

(Vj - g) - S\(Vj-\ - g) - S 2 (Vj _ 2 - g) = o 

which is of the same form as (11.2.13). Thus, the behavior of the step response Vj about 
its asymptotic value g parallels the behavior of the impulse response about the time axis. 
In the situation where there are complex roots, the step response “overshoots” the value 
g and then oscillates about this value until it reaches equilibrium. When the roots are real 
and positive, the step response, which is the sum of two exponential terms, approaches its 
asymptote g without crossing it. However, if there are negative real roots, the step response 
may overshoot and oscillate as it settles down to its equilibrium value. 

In Figure 11.5, the dots indicate two discrete step responses, labeled R and C, respec¬ 
tively, in relation to a discrete step input indicated by dots at the bottom of the figure. The 
difference equation models 1 corresponding to R and C are 

R : (1 - 0.97 B + 0.22 B 2 )Y, = 5(0.15 + 0.09 B)X,_ l 

C : (1 - 1.155 + 0.49 B 2 )Y, = 5(0.19 + 0A5B)X t _ 1 

Also shown in Figure 11.5 is a diagram of the stability region with the parameter points 
(<5], <5 2 ) marked for each of the two models. Note that the system described by model R, 
which has real positive roots, has no overshoot while that for model C, which has complex 
roots, does have overshoot. 

11.2.4 Recursive Computation of Output for Any Input 

It would be extremely tedious if it were necessary to use the impulse response form (11.1.2) 
of the model to compute the output for a given input. Fortunately, this is not necessary. 
Instead, we may employ the difference equation model directly. In this way it is a simple 
matter to compute the output recursively for any input. For example, consider the model 
with /• = 1, s = 0, b = 1, and with £ = 1 and g = 5. Thus, 

(i + v)y, = 5 x t _ { 


or equivalently. 


(1-0.55)Y, = 2.5A,_, (11.2.14) 

Table 11.3 shows the calculation of Y t when the input X t is (a) a unit pulse input, (b) a 
unit step input, and (c) a “general” input. In all cases, it is assumed that the output has the 
initial value Y Q = 0. To perform the recursive calculation, the difference equation is written 
out with Y t on the left. Thus, 


Y t = OAT,.! + 2.5A,_! 


1 The parameters in these models were in fact selected, in a manner to be discussed in Section 11.3.2, so that at the 
discrete points, the step responses exactly matched those of the continuous systems introduced in Section 11.1.2. 
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TABLE 11.3 Calculation of Output from Discrete First-Order System for Impulse, Step, 
and General Input 


t 

(a) 

Impulse Input 

(b) 

Step Input 

(c) 

General Input 

Input 

Output 

Y, 

Input 

Output 

Y, 

Input 

Output 

Y, 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 

1.5 

0 

2 

0 

2.50 

1 

2.50 

0.5 

3.75 

3 

0 

1.25 

1 

3.75 

2.0 

3.12 

4 

0 

0.62 

1 

4.38 

1.0 

6.56 

5 

0 

0.31 

1 

4.69 

-2.5 

5.78 

6 

0 

0.16 

1 

4.84 

0.5 

-3.36 


and, for example, in the case of the “general” input 

Y 1 = 0.5 X 0 + 2.5 X 0 = 0 
Y 2 = 0.5x0 + 2.5x1.5 = 3.75 
T 3 = 0.5 X 3.75 + 2.5 X 0.5 = 3.125 

and so on. These inputs and outputs are plotted in Figure 11.7(a), (b), and (c). 

In general, we see that having written the transfer function model in the form 

Y t = <5i T r -| + ••• + S r Y t _ r + co Q X t _ b - co 1 X t _ b _ l - ■■■ - w s X t _ b _ s 

it is an easy matter to compute the discrete output for any discrete input. To start off the 
recursion, we need to know certain initial values. This need is not, of course, a shortcoming 
of the method of calculation but comes about because with a transfer function model, the 
initial values of Y will depend on values of X that occurred before observation was begun. 
In practice, when the necessary initial values are not known, we can substitute mean values 
for unknown Y’s and A’s (zeros if these quantities are considered as deviations from 
their means). The early calculated values will then depend upon this choice of the starting 
values. However, for a stable system, the effect of this choice will be negligible after a 
period sufficient for the impulse response to become negligible. If this period is p 0 time 
intervals, an alternative procedure is to compute W1’- directly from the impulse 
response until enough values are available to set the recursion going. 


11.2.5 Transfer Function Models with Added Noise 

In practice, the output Y could not be expected to follow exactly the pattern determined 
by the transfer function model, even if that model were entirely adequate. Disturbances of 
various kinds other than X normally corrupt the system. A disturbance might originate at 
any point in the system, but it is often convenient to consider it in terms of its net effect 
on the output Y, as indicated in Figure 1.5. If we assume that the disturbance, or noise 
N t , is independent of the level of X and is additive with respect to the influence of X, 
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Impulse 


l 0 12 3 4 5 6 

5 . * * • 

Fout 0 


0 >) 


Step 


^ IN 


*bo i 


(c) 


General 


FIGURE 11.7 Response of a first-order system to (a) an impulse, (b) a step, and (c) a “general” 
input. 


we can write 


Y t = 5~\B)co(B)X t _ b + N, (11.2.15) 

If the noise process N, can be represented by an ARIMA(p, d, q) model 

N, = cp-\B)0(B)a t 

where a, is white noise, the model (11.2.15) can be written finally as 

Y, = 8~ 1 (B)co(B)X t _ b + cp-\B)0{B)a t (11.2.16) 

In Chapter 12, we describe methods for identifying, fitting, and checking combined transfer 
function-noise models of the form (11.2.16). 


11.3 RELATION BETWEEN DISCRETE AND CONTINUOUS MODELS 

The discrete dynamic model, defined by a linear difference equation, is of importance in its 
own right. It provides a sensible class of transfer functions and needs no other justification. 
In many examples, no question will arise of attempting to relate the discrete model to a 
supposed underlying continuous model because no underlying continuous series properly 
exists. However, in some cases, for example, where instantaneous observations are taken 
periodically on a chemical reactor, the discrete record can be used to tell us something 
about the continuous system. In particular, control engineers are used to thinking in terms 
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of the time constants and dead times of continuous systems and may best understand the 
results of the discrete model analysis when so expressed. 

As before, we denote a continuous output and input at time t by Y ( t ) and X(t), respec¬ 
tively. Suppose that the output and input are related by the linear filtering operation 

y r oo 

v(u)X (t — u)du 
o 

Suppose now that only discrete observations ( X t , Y t ), (A r _|, Y t _ j),... of output and input 
are available at equispaced intervals of time t, t — 1,... and that the discrete output and 
input are related by the discrete linear filter 


Y, = 'Z v jXt-j 

3 =0 

Then, for certain special cases, and with appropriate assumptions, useful relationships may 
be established between the discrete and continuous models. 


11.3.1 Response to a Pulsed Input 

A special case, which is of importance in the design of the discrete control schemes 
discussed in Part Four, arises when the opportunity for adjustment of the process occurs 
immediately after observation of the output, so that the input variable is allowed to remain 
at the same level between observations. The typical appearance of the resulting square 
wave, or pulsed input as we shall call it, is shown in Figure 11.8. We denote the fixed level 
at which the input is held during the period t — l<r<fby X t _ l + . 

Consider a continuous linear system that has b whole periods of delay plus a fractional 
period c of further delay. Thus, in terms of previous notation, b + c = r. Then, we can 
represent the output from the system as 

y r oo 

v(u)X (t — u)du 
o 


X l+ 



FIGURE 11.8 Example of a pulsed input. 
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T(i) 



FIGURE 11.9 Transfer to output from a pulsed input. 


where the impulse response function v(u) is zero for u < b + c. Now for a pulsed input, as 
shown in Figure 11.9, the output at time t will be given exactly by 


rb +1 


rb+2 

/ v(u)du 

X t-b- 1+ + 

/ v{u)du 

J b+c 


Jb +1 


Thus, 

Y(t) = Y t = D b X t _ b _ l+ + v b+l X,_ b _ 2+ + ■■■ 

Therefore, for a pulsed input, there exists a discrete linear filter that is such that at times 
t,t — 1, t — 2,..., the continuous output Y (f) exactly equals the discrete output. 

Given a pulsed input, consider the output Y t from a discrete model 

4(y)Y, = >,(y)x t _ b _ l+ (n.3.1) 

of order (r, r ) in relation to the continuous output from the Rth-order model 

(1 + E X D + E 2 D 2 + - + E R D R )Y(t) = X(t — b — c) (11.3.2) 

subject to the same input. It is shown in Appendix A11.1 that for suitably chosen values 
of the parameters (S, c), the outputs will coincide exactly if R = r. Furthermore, if c = 0, 
the output from the continuous model (11.3.2) will be identical at the discrete times with 
that of a discrete model (11.3.1) of order ( r, r — 1). We refer to the related continuous and 
discrete models as discretely coincident systems. If, then, a discrete model of the form 
(11.3.1) of order ( r, r) has been obtained, then on the assumption that the continuous model 
would be represented by the rth-order differential equation (11.3.2), the parameters, and 
in particular the time constants for the discretely coincident continuous system, may be 
written explicitly in terms of the parameters of the discrete model. 

The parameter relationships for a delayed second-order system have been derived in 
Appendix A11.1. From these, the corresponding relationships for simpler systems may be 
obtained by setting appropriate constants equal to zero, as we shall now discuss. 
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11.3.2 Relationships for First- and Second-Order Coincident Systems 
Undelayed First-Order System. 

B Form. The continuous system satisfying 

(1 +TD)Y{t) = gX{t) (11.3.3) 

is, for a pulsed input, discretely coincident with the discrete system satisfying 

(1 - 8B)Y, = co 0 X t _ l+ (11.3.4) 

where 

S = e~ 1/T T = (—In <5) _1 © 0 = g(l - 5) (11.3.5) 

V Form. Alternatively, the difference equation may be written 

(1 + ZV)Y t = gX t _ l+ (11.3.6) 

where 

£=—— (11.3.7) 

1-5 

To illustrate, we reconsider the example of Section 11.2.4 for the “general” input. The 
output for this case is calculated in Table 11.3(c) and plotted in Figure 11.7(c). Suppose 
that, in fact, we had a continuous system: 

(1 + \AAD)Y(t) = 5 X(t) 

Then this would be discretely coincident with the discrete model (11.2.14) actually consid¬ 
ered, namely, 


(1 -0.5 B)Y t = 2.5X,_ x+ 

If the input and output were continuous and the input were pulsed, the actual course 
of the response would be that shown by the continuous lines in Figure 11.10. The output 
would in fact follow a series of exponential curves. Each dashed line shows the further 
course that the response would take if no further change in the input were made. The curves 
correspond exactly at the discrete sample points with the discrete output already calculated 
in Table 11.3(c) and plotted in Figure 11.7(c). 



FIGURE 11.10 Continuous response of the system (1 + \AAD)Y(t) = 5 X(t) to a pulsed input. 
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Delayed First-Order System. 

B Form. The continuous system satisfying 

(1 +TD)Y(t) = gX(t-b-c) (11.3.8) 

is, for a pulsed input, discretely coincident with the discrete system satisfying 

(1 - 8B)Y t = (co 0 - w x B)X t _ b _ l+ (11.3.9) 

where 

S = e~ l/T a, 0 = g(l - S l ~ c ) co l = g(8 - 5 1_c ) (11.3.10) 

V Form. Alternatively, the difference equation may be written 

(1 + % S 7 ) Y t = g(l + /yV)X,_ A _ 1+ 

where 

. 5 S(8~ c - 1 ) 

c =- — w =-- 

1—5 1-5 

Now 

(1 + n V)X,_ b _ l+ = (1 +r,)X t _ b _ l+ -r,X t _ b _ 2+ (11.3.13) 

can be regarded as an interpolation at an increment (— 77 ) between X t _ b _ l+ and X t _ h _ 2+ . 
Table 11.4 allows the corresponding parameters (£, —rj) and (T, c) of the discrete and 
continuous models to be determined for a range of alternatives. 

Undelayed Second-Order System. 

B Form. The continuous system satisfying 

(1 + r,D)( 1 + T 2 D)Y(t) = gX(t ) (11.3.14) 


(11.3.11) 

(11.3.12) 


TABLE 11.4 Values of — r] for Various Values of T and c for a First-Order System with Delay; 
Corresponding Values of £, and 8 


8 

f 

T 



— r \ for 



c = 0.9 

c = 0.7 

c = 0.5 

c = 0.3 

c = 0.1 


9.00 

9.49 

0.90 

0.69 

0.49 

0.29 

0.10 

0.8 

4.00 

4.48 

0.89 

0.68 

0.47 

0.28 


0.7 

2.33 

2.80 

0.88 

0.66 

0.46 

0.26 


0.6 

1.50 

1.95 

0.88 

0.64 

0.44 

0.25 

0.08 

0.5 

1.00 

1.44 

0.87 

0.62 

0.41 

0.23 


0.4 

0.67 

1.09 

0.85 

0.60 

0.39 

0.21 


0.3 

0.43 

0.83 

0.84 

0.57 

0.35 

0.19 


0.2 

0.25 

0.62 

0.82 

0.52 

0.31 

0.15 

iBH 

01 

0.11 

0.43 

0.77 

0.45 

0.24 

0.11 
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is, for a pulsed input, discretely coincident with the system 

(1 - ftB - S 2 B 2 )Y t = (co Q - a 1 B)X,_ l+ (11.3.15) 

or equivalently, with the system 

(1 - S 1 B)(l - S 2 B)Y t = ( co 0 - co l B)X,_ 1+ (11.3.16) 

where 

A, = e~ l ' T ' S 2 = e-W 

w 0 = g(T x - T 2 r l [Ti (1 - Aft - T 2 ( 1 - Aft] (11.3.17) 

®i = g(T, - r 2 r 1 [r 1 5 2 (i - Aft - r 2 s 1 (i - Aft] 


V Form. Alternatively, the difference equation may be written 

(1 + ft V + ftV 2 )ft = g(l +1/1 V)X ( _ 1+ (11.3.18) 

where 

-r ll =(l-S l r\l-S 2 r\T l -T 2 r 1 [T 2 S l (l-S 2 )-T l S 2 (l-S l )] (11.3.19) 


may be regarded as the increment of an interpolation between X t _ l+ and X t _ 2+ . Values for 
ft and ft in terms of the <5’s can be obtained directly using the results given in Table 11.1. 

As a specific example. Figure 11.5 shows the step response for two discrete systems 
we have considered before, together with the corresponding continuous responses from the 
discretely coincident systems. 

The pair of models are, for curve C, 


Continuous : (1 + 1.41D + 2 D 2 )Y(t) = 5 X(t) 

Discrete : (1 - 1.15B + 0.49B 2 )Y t = 5(0.19 + 0.15B)X,_ 1+ 

and for curve R , 


Continuous : (1 + 2£>)(1 + D)Y(t ) = 5 AC (?) 

Discrete : (1 - 0.97 B + 0.22 B 2 )Y t = 5(0.15 + 0.09 B)X,_ 1+ 


The continuous curves were drawn using (11.1.18) and (11.1.12), which give the continuous 
step responses for second-order systems having, respectively, complex and real roots. 

The discrete representation of the response of a second-order continuous system with 
delay to a pulsed input is given in Appendix A11.1. 


11.3.3 Approximating General Continuous Models by Discrete Models 

Perhaps we should emphasize once more that the discrete transfer function models do not 
need to be justified in terms of, or related to, continuous systems. They are of importance 
in their own right in allowing a discrete output to be calculated from a discrete input. 
However, in some instances, such relationships are of interest. 

For continuous systems, the pulsed input arises of itself in control problems when the 
convenient way to operate is to make an observation on the output Y and then immediately 
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to make any adjustment that may be needed on the input variable X. Thus, the input variable 
stays at a fixed level between observations, and we have a pulsed input. The relationships 
established in the previous sections may then be applied immediately. In particular, these 
relationships indicate that with the notation we have used, the undelayed discrete system 
is represented by 


«v)y, = n(y)x t _ l+ 


in which the subscript t — 1+ on X is one step behind the subscript t on Y. 


Use of Discrete Models When Continuous Records Are Available. Even though we have 
a continuous record of input and output, it may be convenient to determine the dynamic 
characteristics of the system by discrete methods, as we describe in Chapter 12. Thus, if 
pairs of values are read off with a sufficiently short sampling interval, very little is lost by 
replacing the continuous record by the discrete one. 

One way in which the discrete results may then be used to approximate the continuous 
transfer function is to treat the input as though it were pulsed, that is, to treat the input 
record as if the discrete input observed at time j extended from just after j — y to j + y- 

Thus, X(t) = Xj(J — i < t < j + i). We can then relate the discrete result to that of the 
continuous record by using the pulsed input equations with X t replacing X t+ and with 
b + c — i replacing b + c, that is, with one half a time period subtracted from the delay. 
The continuous record will normally be read at a sufficiently small sampling interval so that 
sudden changes do not occur between the sampled points. In this case, the approximation 
will be very close. 


APPENDIX All.l CONTINUOUS MODELS WITH PULSED INPUTS 

We showed in Section 11.3.1 (see also Fig. 11.9) that for a pulsed input, the output from 
any delayed continuous linear system 


r oo 

Y (?) = / v(u)X(t - u)du 

J o 


where v(u) = 0 ,u < b + c, exactly given at the discrete times t, t — 1, t — 2,... by the discrete 
linear filter 


Y t = v(B)X t _ l+ 


where the weights v 0 , iq,..., v b _\ are zero and the weights v b , v b+l ,... are given by 


v b 


v b+j 


rb +1 

/ v(u)du 

Jb+c 
rb+j+l 

/ v(u)du 
Jb+j 


j > 1 


(All.1.1) 
(All.1.2) 
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Now suppose that the dynamics of the continuous system is represented by the _Rth-order 
linear differential equation 


S (D)Y(t) = gX(t-b-c) (All. 1.3) 

which may be written in the form 

R 

IJ(1 + T h D)Y(t) = gX(t -b-c) 
h= 1 

where 7), T 2 ,... ,T R may be real or complex. We now show that for a pulsed input, 
the output from this continuous system is discretely coincident with that from a discrete 
difference equation model of order ( r , /•), or of order (/•, r — 1) if c = 0. Now v(u) is zero 
for u < b + c and for u> b + c is in general nonzero and satisfies the differential equation 

R 

J~J (1 + T h D)v(u — b — c) = 0 u > b + c 

h=l 


Thus, 


v(u) = 0 u < b + c 

v(u) = ai e- (u - b - c)/T ' + a 2 e- (u - b - c)/T i + - + a R e- ( “- b ~ c) / T * u>b + c 

Hence, using (All. 1.1) and (All. 1.2), 

v b =^a h T h [l-e-V- c V T »] 

h= 1 

v b+j = 2 a h T h( l ~ e~ l/Th )e clTh e~ j/Th j > 1 
h= 1 

It will be noted that in the particular case when c = 0, the weights v b+ j are given by 
(A11.1.2) for j = 0 as well as for j > 0. 

Now consider the difference equation model of order (/-, s), 

S(B)Y t = co(B)B b X,_ 1+ (All.1.6) 

If we write 

Q.(B) = co(B)B b 

the discrete transfer function u( B ) for this model satisfies 

8(B)v(B) = 0(5) (All. 1.7) 

As we have observed in (11.2.8), by equating coefficients in (A11.1.7) we obtain b zero 
weights » 0 , Oj, ..., v b _ { , and if s > r, a further s — r + 1 values v b , v b+l ,..., v b+s _ r which 
do not follow a pattern. The weights Vj eventually satisfy 

8(B)vj =0 j > b+ s 


(Al 1.1.4) 

(All.1.5) 


(All.1.8) 
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with v b+s , Vj, + s-i, ■ ■■, Vb +s - r+ i supplying the required r starting values. Now write 

r 

8(B) = J](l - S h B) 

h=\ 

where A" 1 , S ~ 1 ,, S ~ 1 are the roots of the equation 8(B) = 0. Then, the solution of 
(All. 1.8) is of the form 

Vj = + A 2 (w)S j 2 + - + A r (m)S J r j>b + s-r (All.1.9) 

where the coefficients A h (co) are suitably chosen so that the solutions of (A11.1.9) for j = 
s — r + l,s — r + 2, ... ,s generate the starting values » 6+J _ r+1 ,..., v b+s , and the notation 
A h (co) is used as a reminder that the A h ’s are functions of co 0 , aq, ..., m s . Thus, if we set 
s = r, for given parameters (®, 5) in (A11.1. 6 ), and hence for given parameters (a. S), there 
will be a corresponding set of values A h (m) (h = 1,2, ...,/•) that produce the appropriate r 
starting values v b+l , v b+2 ,..., v b+r Furthermore, we know that v b = co 0 . Thus, 


v b = co 0 (All.1.10) 

r 

v h+J = E A ^)S{ (All.1.11) 

h= 1 

and we can equate the values of the weights in (A 11.1.4) and (A 11.1.5), which come from 
the differential equation, to those in (All. 1.10) and (A11.1.11), which come from the 
difference equation. To do this, we must set 

R = r S h = e~ l/Th 


and the remaining r + 1 equations 


®0 = Z a ^ (] ~ S l ~ C ) 

h =1 

A h (co) = a h T h ( 1 - S h )S~ c 

determine c, a 1 ,a 2 ,..., a r in terms of the S h ’ s and <x>j s. 

When c = 0, we set s = r — 1, and for given parameters (a, S) in the difference equation, 
there will then be a set of r values A h (co) that are functions of u >|,... ,co r _ j, which 
produce the r starting values v b , v b+l , ..., v b+r _ t and which can be equated to the values 
given by (A11.1.5) for j = 0,1,..., r — 1. To do this, we set 

R = r S h = e~ l/Th 


and the remaining r equations 


A h (a>) = a h T h ( 1 - S h ) 


determine a 1 ,a 2 ,..., a r , in terms of the S h ’s and coj’s. 

It follows, in general, that for a pulsed input the output at times t, t — 1,... from the 
continuous /-th-order dynamic system defined by 


E(D)Y(t) = gX(t -b-c) 


(All.1.12) 
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is identical to the output from a discrete model 


«V)y, = gy,(S)X t _ b _ l+ (All.1.13) 

of order (/•, r) with the parameters suitably chosen. Furthermore, if c = 0, the output from the 
continuous model (A 11.1.12) is identical at the discrete times to that of a model (A11.1.13) 
of order ( r , r — 1). 

We now derive the discrete model corresponding to the second-order system with delay, 
from which the results given in Section 11.3.2 may be obtained as special cases. 

Second-Order System with Delay. Suppose that the differential equation relating input and 
output for a continuous system is given by 

(1 + r,D)(l + T 2 D)Y(t) = g X(t -b-c) (All. 1.14) 

Then, the continuous impulse response function is 

v(u) = g(T l -T 2 )- 1 (e- (u - b - c)/T '-e- (u - b ~ c)/T 2) u>b + c (All.1.15) 

For a pulsed input, the output at discrete times t, t — 1, t — 2,... will be related to the 
input by the difference equation 

(1 +£i v + | 2 V 2 )Y, = g(l + i 7 jV + > 7 2 V 2 )X,_ a _ 1+ (All. 1.16) 

with suitably chosen values of the parameters. This difference equation can also be written 
(1 - M - S 2 B 2 )Y, = (®o - a> x B - w 2 B 2 )X,_ b _ 1+ 


or 


(1 - S x B)(l - S 2 B)Y r = (® 0 - co x B - co 2 B 2 )X t _ b _ l+ (All. 1.17) 


Using (All.1.1) and (Al 1.1.2) and writing 

A, = e~ l/T ' S 2 = e _1/I 2 


we obtain 


y rb +1 

1 v(u) du = g{T ! - T 2 )~ l [T x {\ - s\~ c ) - T 2 (] - ^- c )] 

b+c 

fb+j +1 

v b+J = / v(u) du = g(T\ - T 2 y l [T { s; c ( 1 - Aj)^ - T 2 S~ c ( 1 - A 2 )A 2 ] 

J b+i 


j > i 


Thus, 


(T x - T 2 )v(B) = gB»T x [ 1 - S\~ c + S; c ( 1 - Aj)(l - S x B)- l S x B] 

- gB b T 2 [ 1 - + S~ c { 1 - 5 2 )(1 - S 2 B)~ l S 2 B] 

But from (All. 1.17), 


tj(B) = B b (co 0 — co x B — co 2 B 2 ) 


(l-^BKl-AoB) 
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Hence, we obtain 

® 0 = g(T { - T 2 )~\t a (1 - S\~ c ) - T 2 ( 1 - £*-')] 

®i = g(Ti ~ T 2 r 1 [(S 1 + S 2 )(T l - T 2 ) + T 2 S\~ c ( 1 + SO - 1^(1 + S 2 )] 

(All.1.18) 

® 2 = gjj j 2 (r, - r 2 )- 1 [r 2 (i - s?) - r,(i - s?)] 

and 

+ X 2 = e~ l/r ' + e _1/7 2 <5 2 = -SjSj = -e-d/TD-a/iy (All.1.19) 


Complex Roots. If T l and T 2 are complex, corresponding expressions are obtained by 
substituting 

T, = C _1 e u r 2 = C“'e“ a (r = -1) 


yielding 


® o = 
co 2 = 

CO| 



i - 



-f(i-c)coU sin [^( 1 _ c ) S i n 
sin /l 

gfccosi s j n (_^ c s j n /l 4- X) 
sin X 


®o-®2-(! ~ 5 \ ~ 8 i)g 


) 


(A 11.1.20) 


where 


(5 1 = 2e“ fcoS/l cos(CsinA) (All.1.21) 

«5 2 = - e - 2fC0S/l 

APPENDIX A11.2 NONLINEAR TRANSFER FUNCTIONS 
AND LINEARIZATION 

The linearity (or additivity ) of the transfer function models we have considered implies that 
the overall response to the sum of a number of individual inputs will be the sum of the 
individual responses to those inputs. Specifically, that if 1 is the response at time t to 
an input history {X r (1) } and {Y ( (2) } is the response at time t to an input history {X ; <2) } the 
response at time t to an input history {X ( (1) + X ( <2) } would be T ( (1) + Yf~ 2 \ and similarly 
for continuous inputs and outputs. In particular, if the input level is multiplied by some 
constant, the output level is multiplied by this same constant. In practice, this assumption 
is probably never quite true, but it supplies a useful approximation for many practical 
situations. 

Models for nonlinear systems may sometimes be obtained by allowing the parameters 
to depend upon the level of the input in some prescribed manner. For example, suppose 
that a system were being studied over a range where Y had a maximum t], and for any X 
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the steady-state relation could be approximated by the quadratic expression 

Y oa = r ] -h(p-X ) 2 

where Y and X are, as before, deviations from a convenient origin. Then, 

dY m 

g(X) = -jy = Up - X ) 

and the dynamic behavior of the system might then be capable of representation by the 
first-order difference equation (11.3.4) but with variable gain proportional to k(p — X). 
Thus, 


Y, = 8Y t _ x + k{n - X,_ 1+ )(l - S)X,_ l+ (All.2.1) 

Dynamics of a Simple Chemical Reactor. It sometimes happens that we can make 
a theoretical analysis of a physical problem that will yield the appropriate form for the 
transfer function. In particular, this allows us to see very specifically what is involved in 
the linearized approximation. 

As an example, suppose that a pure chemical A is continuously fed through a stirred 
tank reactor, and in the presence of a catalyst a certain proportion of it is changed to a 
product B, with no change of overall volume; hence the material continuously leaving the 
reactor consists of a mixture of B and unchanged A. 

Suppose that initially the system is in equilibrium and that with quantities measured in 
suitable units: 

1. p is the rate at which A is fed to the reactor (and consequently is also the rate at 
which the mixture of A and B leaves the reactor). 

2. rj is the proportion of unchanged A at the outlet, so that 1 — rj is the proportion of the 
product B at the outlet. 

3. V is the volume of the reactor. 

4. k is a constant determining the rate at which the product B is formed. 

Suppose that the reaction is ‘ ‘first order’ ’ with respect to A, which means that the rate 
at which B is formed and A is used up is proportional to the amount of A present. Then, 
the rate of formation of B is kVrj, but the rate at which B is leaving the outlet is p (1 — tj), 
and since the system is in equilibrium, 

p(l- n ) = kVr] (A11.2.2) 

Now, suppose that the equilibrium of the system is disturbed, the rate of feed to the 
reactor at time t being p + X (t ) and the corresponding concentration of A in the outlet 
being rj + Y ( t). Now, the rate of chemical formation of B, which now equals kV[rj + Y (?)], 
will in general no longer exactly balance the rate at which B is flowing out of the system, 
which now equals [p + 3T(f)][l — rj — Y(t)]. The difference in these two quantities is the 
rate of increase in the amount of B within the reactor, which equals — V[dY(t)/dt]. Thus, 

- V< ^7T = kv to + F «] - Ik + *«][! -n- Y(t )] (A 11.2.3) 

at 
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Using (A11.2.2) and rearranging, (A11.2.3) may be written 

(kV +^ + VD)Y(t ) = X(t)[ 1 - >/ - Y(t )] 


or 


(1 + TD)Y(t ) = (1 - ^)X(t) 

1 — t] 

where 

T = V = 1 ~ y] 

kV + ^ kV + ^ 


(A11.2.4) 


(A11.2.5) 


Now (All .2.4) is a nonlinear differential equation, since it contains a term X (l) multiplied 
by Y ( t ). However, in some practical circumstances, it could be adequately approximated 
by a linear differential equation, as we now show. 

Processes operate under a wide range of conditions, but certainly a not unusual situation 
might be one where 100(1 — rf), the percentage conversion of feed A to product B was, 
say, 80%, and 100T(f), the percentage fluctuation that was of practical interest, was 4%. 
In this case, the factor 1 —Y(t)/( 1 — rj) would vary from 0.95 to 1.05 and, to a good 
approximation, could be replaced by unity. The nonlinear differential equation (A11.2.4) 
could then be replaced by the linear first-order differential equation 


(l+TD)Y(t) = gX(t) 


where T and g are as defined in Section 11.1.2. If the system was observed at discrete 
intervals of time, this equation could be approximated by a linear difference equation. 

Situations can obviously occur when nonlinearities are of importance. This is particularly 
true of optimization studies, where the range of variation for the variables may be large. A 
device that is sometimes useful when the linear assumption is not adequate is to represent the 
dynamics by a set of linear models applicable over different ranges of the input variables. 
This approach could lead to nonlinear transfer function models similar in spirit to the 
threshold AR stochastic models considered in Section 10.3. However, for discrete systems 
it is often less clumsy to work directly with a nonlinear difference equation that can be 
“solved’ ’ recursively rather than analytically. For example, we might replace the nonlinear 
differential equation (A11.2.4) by the nonlinear difference equation 


{\+^S/)Y t = g{\ + r ln Y t _ l )X t _ l 

which has a form analogous to a particular case of the bilinear stochastic models discussed 
in Section 10.3. 


EXERCISES 


11.1. In the following transfer function models, X, is the methane gas feed rate to a gas 
furnace, measured in cubic feet per minute, and Y t the percent carbon dioxide in the 
outlet gas: 


(1) Y t = 10 + 


25 




t-i 


1 - 0.7 B 
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(2) Y, = 10 + 

(3) Y t = 10 + 


22 - 12.55 v 
1 -0.855 
20-8.55 

1 - 1.25 + 0.45 2 '“ 3 


(a) Verify that the models are stable. 

(b) Calculate the steady-state gain g, expressing it in the appropriate units. 


11.2. For each of the models of Exercise 11.1, calculate from the difference equation and 
plot the responses to: 

(a) A unit impulse (0,1,0,0,0,0,.. .) applied at time r = 0 

(b) A unit step (0,1,1,1,1,1,...) applied at time t = 0 

(c) A ramp input (0,1,2, 3,4,5,...) applied at time t = 0 

(d) A periodic input (0,1,0, — 1,0,1,.. .) applied at time t = 0 
Estimate the period and damping factor of the step response to model (3). 

11.3. Use equation (11.2.8) to obtain the impulse weights u- for each of the models of 
Exercise 11.1, and check that they are the same as the impulse response obtained in 
Exercise 11.2(a). 


11.4. Express the models of Exercise 11.1 in V form. 

11.5. (a) Calculate and plot the response of the two-input system 


10 + 


1 - 0.75 


-X 


l,r-l 


+ 


1-0.55 


-X 


2,t-2 


to the orthogonal and randomized input sequences shown below. 


t 



t 

*i, 

*2, 

0 

0 

0 

5 

1 

-1 

1 

-1 

1 

6 

1 

1 

2 

1 

-1 

7 

-1 

-1 

3 

-1 

-1 

8 

-1 

1 

4 

1 

1 





(b) Calculate the gains gj and g 2 of Y with respect to X l and X 2 , respectively, and 
express the model in V form. 
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IDENTIFICATION, FITTING, AND 
CHECKING OF TRANSFER FUNCTION 
MODELS 


In Chapter 11, a parsimonious class of discrete linear transfer function models was intro¬ 
duced: 


y, - 


^r^t—r ^0 ^t—b ^1 ^t—b— 1 *” ^ s^t—b—s 


or 


Y t = 8~\B)co(B)X t _ b 

where X t and Y t are deviations from equilibrium of the system input and output. In practice, 
the system will be infected by disturbances, or noise, whose net effect is to corrupt the 
output predicted by the transfer function model by an amount N t . The combined transfer 
function-noise model may then be written as 

Y, = 8- 1 (B)co(B)X,_ b + N, 

In this chapter, methods are described for identifying, fitting, and checking 
transfer function-noise models when simultaneous pairs of observations (X l ,Y i ), 
(X 2 ,Y 2 ),...,(X N ,Y N ) of the input and output are available at discrete equispaced times 
1,2,..., N. 

Engineering methods for estimating transfer functions are usually based on the choice 
of special inputs to the system, for example, step and sine wave inputs (Young, 1955) 
and "pulse” inputs (Hougen, 1964). These methods have been useful when the system is 
affected by small amounts of noise but are less satisfactory otherwise. In the presence of 
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appreciable noise, it is necessary to use statistical methods for estimating the transfer func¬ 
tion. Two previous approaches that have been tried for this problem are direct estimation 
of the impulse response in the time domain and direct estimation of the gain and phase 
characteristics in the frequency domain, as described, for example, by Briggs et al. (1965), 
Hutchinson and Shelton (1967), and Jenkins and Watts (1968). These methods are often 
unsatisfactory because they involve the estimation of too many parameters. For example, 
to determine the gain and phase characteristics, it is necessary to estimate two parame¬ 
ters at each frequency. The approach adopted in this chapter is to estimate the parameters 
in parsimonious difference equation models. Throughout most of the chapter we assume 
that the input X, is itself a stochastic process. Models of the kind discussed are useful in 
representing and forecasting certain multiple time series. 


12.1 CROSS-CORRELATION FUNCTION 

In the same way that the autocorrelation function was used to identify stochastic models 
for univariate time series, the data analysis tool employed for the identification of transfer 
function models is the cross-correlation function between the input and output. In this 
section, we describe the basic properties of the cross-correlation function and in the next 
section show how it can be used to identify transfer function models. 

12.1.1 Properties of the Cross-Covariance and Cross-Correlation Functions 

Bivariate Stochastic Processes. We have seen in Chapter 2 that to analyze a time series, 
it is useful to regard it as a realization of a hypothetical population of time series called a 
stochastic process. Now, suppose that we want to describe an input time series X t and the 
corresponding output time series Y t from some physical system. For example, Figure 12.1 
shows continuous data representing the (coded) input gas feed rate and corresponding 
output C0 2 concentration from a gas furnace. Then we can regard this pair of time series as 
realizations of a hypothetical population of pairs of time series, called a bivariate stochastic 
process ( X t , Y t ). We will assume that the data are read off at equispaced times yielding a pair 
of discrete time series, generated by a discrete bivariate process, and that values of the time 
series at times t 0 + h. t 0 + 2 h.... ,t 0 + Nh are denoted by (X ], Yj), (X 2 , Y 2 ),... ,(X N , y n ). 

In this chapter, we will use the gas furnace data read at intervals of 9 seconds for 
illustration. The resulting time series ( X t , Y t ) consist of 296 observations and are listed as 
Series J in the Collection of Time Series section in Part Five. Further details about the data 
will be given in Section 12.2.2. 

Cross-Covariance and Cross-Correlation Functions. We have seen in Chapter 2 that a 
stationary Gaussian stochastic process can be described by its mean p and autocovariance 
function y k , or, equivalently, by its mean p, variance a 2 , and autocorrelation function p k . 
Moreover, since y k = y_ k and p k = p_ k . the autocovariance and autocorrelation functions 
need to be considered only for nonnegative values of the lag k = 0,1,2,.... 

In general, a bivariate stochastic process ( X t , Yf) need not be stationary. However, as in 
Chapter 4, we assume that the appropriately differenced process ( x t , y t ). where x t = S/ d *X t 
and y t = V d > ! Y t , is stationary. The stationarity assumption implies in particular that the two 
processes x t and y t have constant means p x and p y and constant variances o 2 and a 2 . If, 
in addition, it is assumed that the bivariate process is Gaussian, or normal, it is uniquely 
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Sampling intervals 
lllllllllllll 9 seconds 


1111111 18 seconds 



t (min) 


FIGURE 12.1 Input gas rate and output C0 2 concentration from a gas furnace. 


characterized by its means n x and n y and its covariance matrix. Figure 12.2 shows the 
different kinds of covariances that need to be considered. 

The autocovariance coefficients of each of the two series at lag k are defined by the 
usual formula: 


y xx (k) = E[(x, - H x ){x t+k - n x )] = E[(x t - H x ){x t _ k - n x )] 

Y yy (k) = E[(y t - fiy)(j t+k ~ fiy)] = E[(y, - fi y )(y t _ k - n y )] 

where we now use the extended notation y xx (k) and y yy (k) for the autocovariances of the 
x t and y, series. The only other covariances that can appear in the covariance matrix are 
the cross-covariance coefficients between x t and y t series at lag +k: 

y xy (k) = E[(x t - Hx )(y t+k -n y )] k = 0,1,2,... (12.1.1) 


t-k t l+k 

t_i_I—I_I_I_I_I_I_I_I_1_I_I_I-1—I 



y v (-k) = y x ,(k) y , f (k) 



FIGURE 12.2 Autocovariances and cross-covariances of a bivariate stochastic process. 
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and the cross-covariance coefficients between the y, and x t series at lag +k: 


Y yx (k) = E[(y, - n y ){x t+k - p x )] k = 0,1,2,... (12.1.2) 

Under (bivariate) stationarity, these cross-covariances must be the same for all t and hence 
are functions only of the lag k. 

Note that, in general, y xy (k) will not be the same as y yx (k). However, since 

Y X y(k) = E[(x,_ k - p x )(y, - p y )\ = E[(y, - p y )(x t _ k - p x )] = y yx (-k) 

we need to define only one function y xy (k) for k = 0, ±1, ±2,.... The function y xy (k) = 
cov[x,, y t+k ], as defined in (12.1.1) for k = 0, ±1,±2,..., is called the cross-covariance 
function of the stationary bivariate process. Similarly, the correlation between x t and y t+k , 
which is the dimensionless quantity given by 

Yxy(k) 

p (k)=— - k = 0, ±1, ±2,... (12.1.3) 

(T x (Ty 

is called the cross-correlation coefficient at lag k, and the function p xy (k), defined for 
k = 0, ±1, ±2,..., the cross-correlation function of the stationary bivariate process. 

Since p xy (k) is not in general equal to p xy (—k), the cross-correlation function, in contrast 
to the autocorrelation function, is not symmetric about k = 0. In fact, it will sometimes 
happen that the cross-correlation function is zero over some range — oo to i or i to + 00 . 
For example, consider the cross-covariance function between the series a, and z t for the 
“delayed” first-order autoregressive process: 


(1 - (pB)z, = a t _ b - 1 < cj) < 1 b > 0 
where a t is white noise with zero mean and variance a~. Then since 

Zt+k = a t+k-b + 4> a t+k-b-\ + 4 > ~ a t+k-b-2 + 

the cross-covariance function between the series a t and z. t is 

(j> k - h a 2 k>b 


Yaz.( k ) = E [ a ,z t+k ] 


0 


k < b 


Hence, for the delayed autoregressive process, the cross-correlation function is 


Paz(k) = 


fk-b-l = f k -\\ - 02 ) 1/2 

°z 

0 


k > b 
k < b 


Figure 12.3 shows this cross-correlation function when b = 2 and <fi = 0.6. 


12.1.2 Estimation of the Cross-Covariance and Cross-Correlation Functions 

We assume that after differencing the original input and output time series d times, there 
are n = N —d pairs of values (x l5 jq),(x 9 ,y 9 ), ...,(x„, y n ) available for analysis. Then 
it is shown, for example, in Jenkins and Watts (1968), that an estimate c xy (k) of the 
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FIGURE 12.3 Cross-correlation function between a t and z t for delayed autoregressive process 
z, - 0.6z,_! = a,_ 2 . 

cross-covariance coefficient at lag k is provided by 


c xy (k) = 


n—k 

- _ ^(yt+k - y) 

n ti 


n+k 


- Yj(y* ~ y ){x <-k - x) 

n ^ 


t= 1 


it = 0,1,2,... 

A; = 0,-1,-2,... 


(12.1.4) 


where x and y are the sample means of the x r series and y t series, respectively. Similarly, 
the estimate r xy {k) of the cross-correlation coefficient p xy (k) at lag k may be obtained 
by substituting in (12.1.3) the estimates c xy (k) for y xy (k), s x = \Jc xx { 0) for a x , and s v = 
y/c yy (0) for a r yielding 

C X y(k) 

r Jk)=— - k = 0,±1,±2,... (12.1.5) 

V? 

The top graph in Figure 12.4 shows the estimated cross-correlation function r xy (k) 
between the input and output series for the discrete gas furnace data obtained by reading 
the continuous data of Figure 12.1 at intervals of 9 seconds. Note that the cross-correlation 
function is not symmetrical about zero and has a well-defined peak at k = +5. indicating 
that the output lags behind the input. The cross-correlations are negative. This is to be 
expected since an increase in the coded input produces a decrease in the output as seen 
from Figure 12.1. The autocorrelation functions of the input and output variables are also 
included in Figure 12.4. Both variables are highly autocorrelated and the slowly decaying 
patterns are indicative of an autoregressive dependence structure in these series. 

Figure 12.4 can be reproduced in R as follows: 


> gasfur = read.table('SeriesJ.txtheader=T) 

> X = gasfur[,1] 

> Y = gasfur[,2] 

> CCF=ccf(Y,X) 

> ACF.y=acf(Y) 

> ACF.x=acf(X) 

> par(mfrow=c(3,1)) 

> plot(CCF,ylab="CCF",main="Cross Correlation Between 
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Cross-correlation between input and output 





FIGURE 12.4 Estimated cross-correlation function between input and output for coded gas furnace 
data read at 9-second intervals along with the autocorrelation functions for the individual series. 


Input and Output") 

> plot(ACF.x,main="ACF for Input") 

> plot(ACF.y,main="ACF for Output") 


12.1.3 Approximate Standard Errors of Cross-Correlation Estimates 

A crude check as to whether certain values of the cross-correlation function p xy (k) could be 
effectively zero may be made by comparing the corresponding cross-correlation estimates 
with their approximate standard errors. Bartlett (1955) showed that the covariance between 
two cross-correlation estimates r xy (k) and r xy {k + /) is, on the normal assumption, and 
k > 0, given by 

c° v[r xy (k), r xy (k + /)] 

00 

-( n-k)~ l ^ {p xx (v)p yy (v + /) + p xy (-v)p xy (v + 2k + l) 

v=—oo 

+ P X y(k) Pxy (k + l)[p 2 xy (v) + \p\ x {v ) + \p] y m 

- Pxy( k )\-Pxx( v )Pxy( v + k + l) + p xy (~v)p yy (v + k + /)] 

- Pxy( k + l)[Pxx( v )Pxy( v + k ) + p xy (~v)p yy (v + k)]} (12.1.6) 
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In particular, setting / = 0, 
var [r xy (k)\ 

00 

- (« - k)~ l Yj { Pxx( v )Pyy( v ) + Pxy( k + D )Pxy( k ~ v ) 

V——O0 

+ pl y mp 2 X y(.v)+\p 2 xx {v)+\p] y m 

- 2 p xy (k)[p xx (v)p xy (v + k) + p xy (-v)p yy (v + k)]} (12.1.7) 

Formulas that apply to important special cases can be derived from these general ex¬ 
pressions. For example, if we assume that x r = y t , it becomes appropriate to set 

p xx (v) = p yy (v) = p xy (v) = p xy (-v) 

On making this substitution in (12.1.6) and (12.1.7), we obtain an expression for the 
covariance between two autocorrelation estimates and, more particularly, the expression 
for the variance of an autocorrelation estimate given earlier in (2.1.13). 

It is often the case that two processes are appreciably cross-correlated only over some 
rather narrow range of lags. Suppose it is postulated that p xy {v) is nonzero only over some 
range < u < Q 2 . Then, 

1. If neither k , k + /, nor k + \l are included in this range, all terms in (12.1.6) except 
the first are zero, and 

00 

co v[r xy (k), r xy (k + /)] ~ (n - k)~ l ^ p xx (v)p yy (v + 1) (12.1.8) 

V——CG 

2. If k is not included in this range, then in a similar way (12.1.7) reduces to 

00 

var [r xy {k)]c±{n-kT x ^ P xx (v)p yy (v) (12.1.9) 

v=—oo 

In particular, on the hypothesis that the two processes have no cross-correlation , that 
is, cross-correlations are zero for all lags, it follows that the simple formulas (12.1.8) and 
(12.1.9) apply for all lags k and k + 1. 

Another special case of some interest occurs when two processes are not cross-correlated 
and one is white noise. Suppose that x t = a t is generated by a white noise process but y, is 
autocorrelated. Then from (12.1.8), 

co v[r ay (k),r ay (k + /)] ~ (n - k)~ l p yy (l) (12.1.10) 

var [r ay (k)] ^ (n - k)~ l (12.1.11) 

Hence, it follows that 

p[r ay (k),r ay (k + l)]cxp yy (l) (12.1.12) 

Thus, in this case the cross-correlations have the same autocorrelation function as the 
process generating the output y t . Thus, even though a, and y t are not cross-correlated, 
the sample cross-correlation function can be expected to vary about zero with standard 
deviation (n — k)~ ] / 2 in a systematic pattern typical of the behavior of the autocorrelation 



IDENTIFICATION OF TRANSFER FUNCTION MODEFS 435 


function p yy (l). Finally, if two processes are both white noise and are not cross-correlated, 
the covariance between cross-correlation estimates at different lags will be zero. 


12.2 IDENTIFICATION OF TRANSFER FUNCTION MODELS 

We now show how to identify a combined transfer function-noise model 

Y l = S-\B)co(B)X t _ b + N l 

for a linear system corrupted by noise N t at the output and assumed to be generated by an 
ARIMA process that is statistically independent 1 of the input X t . Specifically, the objective 
at this stage is to obtain some idea of the orders r and s of the denominator and numerator 
operators in the transfer function model and to derive initial guesses for the parameters 5, a, 
and the delay parameter b. In addition, we aim to make initial guesses of the orders p , d, q 
of the ARIMA process describing the noise at the output and to obtain initial estimates of 
the parameters (j) and 8 in that model. The tentative transfer function and noise models 
so obtained can then be used as a starting point for more efficient estimation methods 
described in Section 12.3. 

Outline of the Identification Procedure. Suppose that the transfer function model 

Y t = v(B)X, + N, (12.2.1) 

may be parsimoniously parameterized in the form 

Y t = 8~ 1 (B)co(B)X,_ b + N t (12.2.2) 

where 8(B) = 1 — 8^B — 8 2 B 2 — ■■■ — 8,.B r and m(B) = co Q — co\B — co 2 B 2 — ■■■ — co s B s . 
The identification procedure is as follows: 

1. Derive rough estimates bj of the impulse response weights Vj in (12.2.1). 

2. Use the estimates bj so obtained to make guesses of the orders r and s of the 
denominator and numerator operators in (12.2.2) and of the delay parameter b. 

3 . Substitute the estimates bj in equations (11.2.8) with values of r, s, and b obtained 
from step 2 to obtain initial estimates of the parameters <5 and co in (12.2.2). 

Knowing the bj , values of />, r, and s may be guessed using the following facts established 
in Section 11.2.2. For a model of the form of (12.2.2), the impulse response weights v y 
consist of: 

1. b zero values v 0 , v l ., v b _i- 

2. A further s — r + 1 values v b , o 6+1 ,..., v b+s _ r following no fixed pattern (no such 
values occur if s < r). 


1 When the input is at our choice, we can guarantee that it is independent of N, by generating X t according to 
some random process. 
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3. Values Vj with j>b + s — r + 1 that follow the pattern dictated by an /-th-order 
difference equation that has r starting values o A+1 ,, i>/, +s _,. + i. Starting values Vj 
for j < b will, of course, be zero. 

Differencing of the Input and Output. The basic tool that is employed here in the iden¬ 
tification procedure is the cross-correlation function between input and output. When the 
processes are nonstationary, it is assumed that stationarity can be induced by suitable dif¬ 
ferencing. Nonstationary behavior is suspected if the estimated auto- and cross-correlation 
functions of the ( X t , Y t ) series fail to damp out quickly. We assume that a degree of differ¬ 
encing 2 d necessary to induce stationarity has been achieved when the estimated auto- and 
cross-correlations r xx {k),r yy (k), and r xy (k) of x t = V d X, and y, = V d F r damp out quickly. 
In practice, d is usually 0, 1, or 2. 

Identification of the Impulse Response Function Without Prewhitening. Suppose that 
after differencing d times, the model (12.2.1) can be written in the form 

y, = d 0 x, + V\X t _ x + v 2 x t _ 2 + + n, (12.2.3) 

where y t = V d Y t ,x t = V d X t , and n, = S7 d N t are stationary processes with zero means. 
Then, on multiplying throughout in (12.2.3) by x t _ k for k > 0, we obtain 

x t-ky t = u o x t-k x t + »\ x ,-k x t-\ + + x t-k n t (12.2.4) 

If we make the further assumption that x t _ k is uncorrelated with n t for all k, taking 
expectations in (12.2.4) yields the set of equations 


Yxylk) = v 0 Y X x(k) + vy/xxik - 1) + ••• k = 0,1,2,... (12.2.5) 

Suppose that the weights Vj are effectively zero beyond k = K. Then the first K + 1 of 
the equations (12.2.5) can be written as 


Yxy = r «v (12.2.6) 

where 



o 

>> 

1_ 


'V 

II 

X 

... VJ 

V = 

»1 


s 

X 

X 


V K_ 


>xx(0) rxx(!) - r xx (K) 

rxx(!) rxx(°) - r xx ( K ~ !) 


Yxx(K) Yxx(K - l) - Y xx (0) 


2 The procedures outlined can equally well be used when different degrees of differencing are employed for input 
and output. 
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Substituting estimates c xx (k ) of the autocovariance function of the input x, and estimates 
c xy (k) of the cross-covariance function between the input x t and output y t , (12.2.6) provides 
K + 1 linear equations for the first K + 1 weights. However, these equations, which do 
not in general provide efficient estimates, are cumbersome to solve for large K and in 
any case require knowledge of the point K beyond which the Vj are effectively zero. The 
sample version of equations (12.2.6) represents essentially, apart from “end effects,” the 
least-squares normal equations from linear regression of y t on x t , x t _\,..., x t _ K , in which 
it is assumed, implicitly, that the noise n t in (12.2.3) is not autocorrelated. This is one 
source of the inefficiency in this identification method, which may be called the regression 
method. To improve the efficiency of this method, Liu and Hanssens (1982) (see also 
Pankratz (1991, Chapter 5)) suggest performing generalized least-squares estimation of the 
regression equation y t = v 0 x t + Vix t _ l + ••• + v K x,_ K + n t assuming the noise n t follows 
some autocorrelated time series ARMA model. They also discuss generalization of this 
method of identification of impulse response functions to the case with multiple input 
processes X lt ,X 2t ,..., X mt in the model, that is, Y\ = iq (B)X lt + •■■ + v m (B)X mt + N t . 

12.2.1 Identification of Transfer Function Models by Prewhitening the Input 

Considerable simplification in the identification process would occur if the input to the 
system were white noise. Indeed, as discussed in more detail in Section 12.5, when the 
choice of the input is at our disposal, there is much to recommend such an input. When 
the original input follows some other stochastic process, simplification is possible by 
prewhitening. 

Suppose that the suitably differenced input process x, is stationary and is capable 
of representation by some member of the general linear class of autoregressive-moving 
average models. Then, given a set of data, we can carry out our usual identification and 
estimation methods to obtain a model for the x t process: 

e;\B)<j> x {B)x t = a, (12.2.7) 

which, to a close approximation, transforms the correlated input series x t to the uncorrelated 
white noise series a,. At the same time, we can obtain an estimate s 2 a of a 2 from the sum 
of squares of the a t ’s. If we now apply this same transformation to y, to obtain 

fit = 0;\B)<i> x (B)y t 

then the model (12.2.3) may be written as 

/i, = v(B)a t + e t (12.2.8) 

where e t is the transformed noise series defined by 

e, = 0?(B)<i> x (B)r, t (12.2.9) 

On multiplying (12.2.8) on both sides by a t _ k and taking expectations, we obtain 

Y a p&) = v k°l 


(12.2.10) 
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where y a p(k) = E[a t _ k fS t \ is the cross-covariance at lag +k between the series a r and /!,. 
Thus, 


Y a p( k ) 

v k = 

or, in terms of the cross-correlations, 

P a p( k )<*p 

v k = —-- 

®a 

Hence, after prewhitening the input, the cross-correlation function between the prewhitened 
input and correspondingly transformed output is directly proportional to the impulse re¬ 
sponse function. We note that the effect of prewhitening is to convert the nonorthogonal 
set of equation (12.2.6) into the orthogonal set (12.2.10). 

In practice, we do not know the theoretical cross-correlation function p a p(k), so we must 
substitute estimates in (12.2.11) to give 


k = 0,1,2, 


( 12 . 2 . 11 ) 


»k = 


r ap( k ) s P 


k = 0,1,2,... 


( 12 . 2 . 12 ) 


The preliminary estimates v k so obtained are again, in general, statistically inefficient but 
can provide a rough basis for selecting suitable operators 8(B) and co( B ) in the transfer 
function model. An additional feature of the prewhitening method is that because the 
prewhitened input series a, is white noise, so that p aa (k) = 0 for all k # 0, there are 
considerable simplifications in formulas (12.1.7) and (12.1.9) for var [r a p(k)\. In particular, 
on the assumption that the series a t and fi, are not cross correlated, the result (12.1.11) 
applies to give simply var [r a p(k)] ~ (n - k)~ l . We now illustrate this identification and 
preliminary estimation procedure with an actual example. 


12.2.2 Example of the Identification of a Transfer Function Model 

In an investigation on adaptive optimization (Kotnour et al., 1966), a gas furnace was 
employed in which air and methane combined to form a mixture of gases containing C0 2 
(carbon dioxide). The air feed was kept constant, but the methane feed rate could be varied 
in any desired manner, and the resulting C0 2 concentration in the off-gases measured. The 
continuous data of Figure 12.1 were collected to provide information about the dynamics 
of the system over a region of interest where it was known that an approximately linear 
steady-state relationship applied. The continuous stochastic input series X ( t ) shown in the 
top half of Figure 12.1 was generated by passing white noise through a linear filter. The 
process had mean zero and, during the realization that was used for this experiment, varied 
from —2.5 to +2.5. It was desired that the actual methane gas feed rate should cover a range 
from 0.5 to 0.7 ft 3 /min. To ensure this, the input gas feed rate was caused to follow the 
process: 


Methane gas input feed = 0.60 — 0.0 4X(t) 

For simplicity, we will work throughout with the “coded” input X(t). The final transfer 
function expressed in terms of the actual feed rate is readily obtained by substitution. 
Series J in the Collection of Time Series section in Part Five shows 296 successive pairs 
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TABLE 12.1 Estimated Cross-Correlation Function After Prewhitening and Approximate 
Impulse Response Function for Gas Furnace Data 


k 


< r ( r ) 


r w (k) 

k 

r c :,(*) 

*(r) 


'>/k) 

0 


0.06 

- 0.02 

1.00 

6 

- 0.27 

0.06 

- 0.52 

0.12 

1 


0.06 

0.10 

0.23 

7 

- 0.17 

0.06 

- 0.32 

0.05 

2 


0.06 

- 0.06 

0.36 

8 

- 0.03 

0.06 

- 0.06 

0.09 

3 


0.05 

- 0.53 

0.13 

9 

0.03 

0.06 

0.06 

0.01 

4 


0.06 

- 0.63 

0.08 

10 

- 0.06 

0.06 

- 0.10 

0.10 

5 


0.05 

- 0.88 

0.01 







of observations ( X, , Y t ) read off from the continuous records at 9-second intervals. In 
this particular experiment, the nature of the input disturbance was known because it was 
deliberately induced. However, we proceed as if it were not known. As shown in Figure 
12.4, the estimated auto- and cross-correlation functions of X t and Y t damp out fairly 
quickly, confirming that no differencing is necessary. The usual model identification and 
fitting procedures applied to the input series X t indicate that it is well described by a 
third-order autoregressive process 

(1 -faB- </> 2 5 2 - </> 3 5 3 )X, = a, 

with </>j = 1.97, 4> 2 = —1-37, </> 3 = 0.34, and s 2 = 0.0353. Hence, the transformations 

a, = (1 - 1.975 + 1.375 2 - 0.345 3 )JA r 
/?, = (]- 1.975 + 1.31B 2 - 0.34 B 3 )Y t 

are applied to the input and output series to yield the series a, and (S t with s u = 0.188 
and Sp = 0.358. The estimated cross-correlation function between a t and fi t is listed in 
Table 12.1 and plotted in Figure 12.5. Table 12.1 also includes the estimate (12.2.12) of 
the impulse response function, 

0.358 
k 0.188 

The approximate standard errors o(r) for the estimated cross-correlations r a p(k) shown in 
Table 12.1 are the square roots of the variances obtained from expression (12.1.7): 

1. With cross-correlations up to lag +2 and from lag +8 onward assumed equal to zero 

2. With autocorrelations p aa (k) assumed zero for k > 0 

3. With autocorrelations Ppp(k) assumed zero for k > 4 

4. With estimated correlations r a p(k) and rpp{k) from Table 12.1 replacing theoretical 
values. 

For this example, the standard errors a(r) differ very little from the approx¬ 
imate values (n — k) -1 / 2 , or as a further approximation n -1 / 2 = 0.06, appropri¬ 
ate under the hypothesis that the series are uncorrelated. The estimated cross¬ 
correlations along with the approximate two standard error limits are plotted in 
Figure 12.5. 
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Cross-correlations after prewhitening 



FIGURE 12.5 Estimated cross-correlation function for coded gas furnace data after prewhitening. 


The values b 0 , v { , and v 2 are small compared with their standard errors, suggesting that 
b = 3 (that there are two whole periods of delay). Using the results of Section 12.1.1, the 
subsequent pattern of the vs might be accounted for by a model with (/-, s, b) equal to either 
(1, 2, 3) or (2, 2, 3). The first model would imply that v 3 and v 4 were preliminary values 
following no fixed pattern and that u 5 provided the starting value for an exponential decay 
determined by the difference equation Vj — br>j_ ] = 0 ,j > 5. The second model would 
imply that u 3 was a single preliminary value and that v 4 and v 5 provided the starting values 
for a pattern of double exponential decay or damped sinusoidal decay determined by the 
difference equation Vj - <5] Vj_ l - S 2 Vj _ 2 = 0, j > 5. Thus, the preliminary identification 
suggests a transfer function model 

(1 - 6 X B - 8 2 B 2 )Y, = (co 0 -co l B-co 2 B 2 )X t _ b (12.2.13) 

or some simplification of it, probably with b = 3. 

Calculations in R. The prewhitening, the calculation of r a p(k), v k , and r^,(k) in Table 
12.1, and the creation of Figure 12.5 can be performed using the R code provided below. 
Note, however, that the results from R differ very slightly from those shown in Table 12.1, 
possibly due to round-off and differences in the treatment of initial values in the series. 

> mml=arima(X,order=c(3,0,0)) 

> mml % Prints the AR(3) coefficients for X 

Call: arima(x = X, order = c(3, 0, 0)) 

Coefficients : 

arl ar2 ar3 intercept 

1.9691 -1.3651 0.3394 -0.0606 
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S.e. 0.0544 0.0985 0.0543 0.1898 

sigma'2 estimated as 0.0353:log likelihood=72.6,aic=-135.1 

> fl=c(1,-mml$coef[1:3]) % Creates a filter to transform Y 

> fl arl ar2 ar3 

1.000 -1.9691 1.3651 -0.3394 

> Yf=filter(Y,f1,method=c("convolution"),sides=l) 

> yprev=Yf[4:296] % transformed Y 

> xprev=mml$residuals[4:296] % transformed X 

> CCF=ccf(yprev,xprev) % computes the cross-correlations 

> CCF % retrieves the cross-correlations 

> vk=(sd(yprev)/sd(xprev))*CCF$acf % impulse response function 

> ACF=acf(yprev) % autocorrelations of transformed Y 

> plot(CCF, ylab=' CCF',main='Cross-correlations after prewhitening') 

Preliminary Estimates. Assuming the model (12.2.13) with b = 3, the equations (11.2.8) 
for the impulse response function are 

Vj = 0 j < 3 

o 3 = co Q 

u 4 = <5]U 3 — coj 

v 5 = <5]D 4 + 8 2 v 3 — a> 2 (12.2.14) 

v 6 = S i v 5 + S 2 V 4 
V 1 = S i V 6 + S 2 V 5 


Substituting the estimates v k from Table 12.1 in the last two of these equations, we obtain 


- 0.885] - 0.63<5 2 = -0.52 

- 0.525] - 0.885 2 = -0.32 


which give preliminary estimates = 0.57 and <5 2 = 0.02. If these values are now substi¬ 
tuted in the second, third, and fourth of equations (12.2.14), we obtain 

co 0 = o 3 = —0.53 

co | = 5j o 3 — v 4 = (0.57)(—0.53) + 0.63 = 0.33 

d> 2 = 5]t5 4 + S 2 v 3 - 65 = (0.57)(—0.63) + (0.02)(-0.53) + 0.88 = 0.51 


Thus, the preliminary identification suggests a tentative transfer function model: 

(1 - 0.57 B - 0.02 B 2 )Y t = -(0.53 + 0.335 + 0.51 B 2 )X,_ 3 

The estimates so obtained can be used as starting values for the more efficient iterative 
estimation methods, which will be described in Section 12.3. Note that the estimate <5 2 is 
very small and suggests that this parameter may be omitted, but we will retain it for the 
time being. 
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12.2.3 Identification of the Noise Model 

Reverting to the general case, suppose that (where necessary, after suitable differencing) 
the model could be written as 


y, = u{B)x, + n, 

where n t = V d N t . Given that a preliminary estimate v(B) of the transfer function has 
been obtained in the manner discussed in Section 12.2.2, an estimate of the noise series is 
provided by 


n,=y t - v(B)x t 


that is, 


n,=y t ~ v 0 x, - - v 2 x,_ 2 - 


Alternatively, v(B) may be replaced by the tentative transfer function model estimate 
S~ l (B)d>(B)B b determined by preliminary identification. Thus, 

n, = y t - s~ l (B)6j(B)x t _ h 

and n t may be computed by first calculating y, = 8~ 1 (B)m(B)x t _ b recursively through 
8(B)y t = w(B)x t _ b as 

y, = (M,-! + ••• + S r y t _ r + w Q x,_ b - CU|X ( _ 6 _| - a>s x t-b-s (12.2.15) 

and then computing the noise series from n, = y t — y t . In either case, study of the estimated 
autocorrelation function and partial autocorrelation function of h t can lead to identification 
of the noise model. 

It is also possible to identify the noise using the correlation functions for the input and 
output, after prewhitening, in the following way. Suppose that the input could be exactly 
prewhitened to give 


j) t = u(B)a t + e t (12.2.16) 

where the known relationship 

e,=0; l (B)4> x (B)n t (12.2.17) 

would link e, and n r If a stochastic model could be found for e t , then, using (12.2.17), 
a model could be deduced for n t and hence for N t . If we now write v(B)a t = u t , so that 
p t = u t + e t , and provided that our independence assumption concerning x t and n t , and 
hence concerning u t and e t , is justified, we can write 

Ypp(k) = Y uu (k) + Y ££ (k) (12.2.18) 

Since a, is white noise, y uu (k) may be obtained using the result (3.1.8), which gives the 
autocorrelation function of a linear process. Thus, 

Yuu(k) = VjV j+k = — Y, Yafi(j)Y a p(J + k ) 

j= o a * 7=0 
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TABLE 12.2 Estimated Autocorrelation and Partial Autocorrelation Functions of the Noise 
in Gas Furnace Data 


k 


ikk 

k 


4>kk 

i 

0.89 

0.89 

i 

0.01 

- 0.02 

2 

0.71 

- 0.43 

8 

- 0.03 

0.01 

3 

0.51 

- 0.13 

9 

- 0.05 

- 0.01 

4 

0.32 

0.02 

10 

- 0.04 

0.08 

5 

0.17 

0.04 

11 

- 0.03 

- 0.06 

6 

0.07 

- 0.02 

12 

- 0.03 



using (12.2.10). Hence, using (12.2.18), the autocovariances of e t may be obtained from 
y £e (k) = Ypp(k) — Y U u(k), with autocorrelations 

r ££ (k) Ppp(k) ~ Y m (k)/Ypp( 0) 

Peei ~ Yee( 0 ) “ 1 - Yuu(0)/Ypp(0) 

_ Ppp( k ) - E” 0 P a p(J)P a p(J + k) 

1 - XJLopI/J) 

Now, in practice, it is necessary to estimate the prewhitening transformation. Having made 
the approximate prewhitening transformation, rough values for p ee (k) could be obtained 
by substituting the estimates r a p(j) of the cross-correlation function between transformed 
input and output and r^{j) of the autocorrelation function of the transformed output. 

Application to the Gas Furnace Example. Table 12.2 shows the first 12 values of 
the sample autocorrelations and partial autocorrelations of the noise series N t = Y t — y t , 
where y t = S~ l (B)d>(B)X t _ 3 is computed as in (12.2.15) using the preliminary estimates 
for the transfer function model obtained previously. That is, the values are computed as 


y, = 0.57t,_i - (0.53X r _ 3 + 0.33X f _ 4 + 0.51X,_ 5 ) 

The partial autocorrelations of N, indicate that a second-order autoregressive model might 
be an adequate representation, and the least-squares estimates obtained from the N t values 
for the AR(2) model yield 

(1 - 1.545 + 0.645 2 )1V, = a, (12.2.19) 

with S' 2 = 0.057. 

Thus, the analysis of this section and Section 12.1.2 suggests the identification 


Y,= 


<*> o 


a> x B 


®2 B 


- - 1 A, 3 +--- a, 

1 -8 x B-8 2 B 2 3 1 - faB - cp 2 B 2 


( 12 . 2 . 20 ) 


for the gas furnace model. Furthermore, the initial estimates ® 0 = —0.53, cb 1 = 0.33, cb 2 = 
0.51,5] = 0.57, S 2 = 0.02,0] = 1.54, and 0 2 = —0.64 can be used as rough starting values 
for the nonlinear estimation procedures that we describe in Section 12.3. 
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12.2.4 Some General Considerations in Identifying Transfer Function Models 

Some general remarks can now be made concerning the procedure for identifying transfer 
function and noise models that we have just described. 


1. For many practical situations, when the effect of noise is appreciable, a delayed first- 
or second-order system such as that given by (12.2.13), or some simplification of it, 
would often provide as elaborate a model as could be justified for the data. 

2. Efficient estimation is only possible assuming the model form to be known. The 
estimates v k given by (12.2.12) are in general necessarily inefficient therefore. They 
are employed at the identification stage because they are easily computed and can 
indicate a form of model worthy to be fitted by more elaborate means. 

3. Even if these were efficient estimates, the number of v’s required to trace out the 
impulse response function fully would typically be considerably larger than the 
number of parameters in a transfer function model. In cases where the <5’s and co’s 
in an adequate transfer function model could be estimated accurately, nevertheless, 
the estimates of the corresponding v’s could have large variances and be highly 
correlated. 

4. The variance of 


r afi(® = v k — 

S P 

is of order 1 /n. Thus, we can expect that the estimates r a p(k) and hence the v k will 
be buried in noise unless o a is reasonably large compared with the residual noise, 
or unless n is large. Thus, the identification procedure requires the variation in the 
input X r to be reasonably large compared with the variation due to the noise and/or a 
large volume of data is available. These requirements are satisfied by the gas furnace 
data for which, as we show in Section 12.3, the initial identification is remarkably 
good. When these requirements are not satisfied, the identification procedure may 
fail. Usually, this will mean that only very rough estimates are possible with the 
available data. However, some kind of rudimentary modeling may be possible by 
postulating a plausible but simple transfer function/noise model, fitting directly by the 
least-squares procedures of the next section, and applying diagnostic checks leading 
to elaboration of the model when this proves necessary. 

5. It should, perhaps, be emphasized that the prewhitened series a t and fi t , and their 
cross-correlation function, r a p(lc), in particular, are used only for the purpose of 
identification of the form of the transfer function model. Once the model form is 
identified, the original series X r and Y r not the prewhitened series, are used for 
parameter estimation, forecasting, and so on. 

6 . An alternative method for identification of the transfer function-noise model was 
proposed by Haugh and Box (1977), and similar ideas were also discussed by Priestley 
(1981, Chapter 9). The method, which might be referred to as “double prewhiten¬ 
ing,” involves prewhitening both input and output series. That is, separate univariate 
ARIMA models are built for both the input and the output processes, and then the 
cross-correlation structure of the resulting (univariate white noise) residuals from 
these models is examined. However, while sometimes useful, this procedure can 
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become overly complicated in terms of the final model specified, due to the use of 
two sets of prewhitening factors. 

7. The above discussion has focused on transfer function models with a single input 
variable X t . An alternative method of identifying transfer function models, which 
readily generalizes to deal with multiple inputs, is given in Appendix A12.1. Transfer 
function models can also be specified using methods developed for multivariate time 
series analysis as demonstrated by Tiao and Box (1981). A discussion of such methods 
is given in Chapter 14. 

Lack of Uniqueness of the Model. Suppose that a particular dynamic system is represented 
by the model 


Y t = 8~ l (B)co(B)X t _ b + <p-\B)0(B)a t (12.2.21) 

Then it could equally well be represented by 

L(B)Y t = L(B)8~ l (B)co(B)X t _ b + L(B)q>~ x (B)6(B)a t (12.2.22) 

where L(B) could be an arbitrary common factor, and hence would be redundant. Similar to 
the discussion in Section 7.3.5 on parameter redundancy for ARMA models, for uniqueness 
of model parameterization in (12.2.21) it is clear that the possibility of common factors 
in the operators 8(B) and w(B), or in the cp(B) and 9(B) operators, must be avoided. The 
chance that we may iterate toward a model of unnecessarily complicated form is reduced 
if we base our strategy on the following considerations: 

1. Since rather simple transfer function models of first or second order, with or without 
delay, are often adequate, iterative model building should begin with a fairly simple 
model, looking for further simplification if this is possible, and reverting to more 
complicated models only as the need is demonstrated. 

2. One should be always on the look out for the possibility of removing a factor common 
to two or more of the operators on Y t , X t , and a t . In practice, we will be dealing with 
estimated coefficients, which may be subject to rather large sampling errors, so that 
only approximate common factors in the factorizations can be expected. Thus, a very 
careful analysis may be needed to detect such factors. Of course, having removed 
what appears to be a common factor, the model can be refitted and checked to show 
whether the simplification can be justified. 

3. When simplification by factorization is possible, but is overlooked, the least-squares 
estimation procedure may become extremely unstable since the minimum will tend 
to lie on a line or surface in the parameter space rather than at a point. Conversely, 
instability in the solution can point to the possibility of simplification of the model. 
As noted earlier, one reason for carrying out the identification procedure before 
fitting the model is to avoid redundancy or, conversely, to achieve parsimony in 
parameterization. 

Remark. If the operator L(B) in (12.2.22) were set equal to cp(B)8(B), we would obtain 


cp(B)8(B)Y t = cp(B)m(B)X t _ b + 8(B)9(B)a, 


( 12 . 2 . 23 ) 



446 IDENTIFICATION, FITTING, AND CHECKING OF TRANSFER FUNCTION MODELS 
which can be written as 


8*(B)Y, = co*(B)X t _ b + 0*(B)a t (12.2.23a) 

Models of the general form of (12.2.23a) have been referred to as ARMAX models in 
the econometric literature (e.g., Hannan and Deistler, 1988; Hannan et al., 1979; Reinsel, 
1979). As can be seen, care is needed to avoid the occurrence of common factors among 
the operators in this form. 


12.3 FITTING AND CHECKING TRANSFER FUNCTION MODELS 
12.3.1 Conditional Sum-of-Squares Function 

We now consider the problem of efficiently and simultaneously estimating the parameters 
b , 6, co, <p , and 6 in the tentatively identified model 

y, = 8~ 1 (B)co(B)x t _ b + n t (12.3.1) 

where y t = S7 d Y t , x, = W d X t , and n, = V d N t are all stationary processes and 

n, = cf>~ 1 (B)9(B)a t (12.3.2) 

It is assumed that n = N — d pairs of values are available for the analysis and that Y t and 
X t (y t and x t if d > 0) denote deviations from expected values. These expected values may 
be estimated along with the other parameters, but for the lengths of time series normally 
worth analyzing it will usually be sufficient to use the sample means as estimates. When 
d > 0, it will frequently be true that expected values for y, and x t are zero. 

If starting values x 0 , yo, and a 0 prior to the commencement of the series were available, 
then given the data, for any choice of the parameters (b , 8, co, <p, 0) and of the starting values 
(xq, y 0 , a 0 ) we could calculate, successively, values of 

a, = a,(b , <5,ffl,0,0|x o ,y o ,a o ) 

for r = 1,2,... ,n. Under the normal assumption for the a/s, a close approximation to 
the maximum likelihood estimates of the parameters can be obtained by minimizing the 
conditional sum-of-squares function, 

n 

S Q (b, 5, co, (/), 0) = ^ 0 ^( 6 , <5, 0, 6 >|x 0 ,y 0 ,a 0 ) (12.3.3) 

i=i 


Three-Stage Procedure for Calculating the a’s. Given appropriate starting values, the 
generation of the afsfor any particular choice of the parameter values may be accom¬ 
plished using the following three-stage procedure. 

First, the output y, from the transfer function model may be computed from 

y, = 8~\B)co(B)x,_ b 


that is, from 


8(B)y t = co(B)x,_ b 
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or from 


y, - 8 x y t _ x - 8 r y,_ r = co 0 x,_ b - w x x,_ b _ x - co s x,_ b _ s (12.3.4) 

Having calculated the y t series, then using (12.3.1), the noise series n t can be obtained 
from 


n, = y t -y, (12.3.5) 

Finally, the o r ’s can be obtained from (12.3.2) written in the form 

9(B)a t = 


that is. 


a, = 9 x a t _ x + ••• + 9 q a,_ q + n, - (j) x n t _ x - 4 b p n t _ p (12.3.6) 

Starting Values. As discussed in Section 7.1.3 for stochastic model estimation, the effect 
of transients can be minimized if the difference equations are started off from a value of t for 
which all previous x,’s and y t ’s are known. Thus, y t in (12.3.4) is calculated from t = u + 1 
onward, where u is the larger of r and s + b. This means that n t will be available from 
n u+x onward; hence, if unknown a t ’s are set equal to their unconditional expected values of 
zero, the a t ’ s may be calculated from a u+p+i onward. Thus, the conditional sum-of-squares 
function is 

n 

S Q (b. 8, co. (j), 6) = Yj |x 0 ,y 0 ,a 0 ) (12.3.7) 

t=u+p +1 


Example Using the Gas Furnace Data. For these data, the model (12.2.20), namely 

w 0 -co x B-co 2 B 2 1 

y- — --— X,_ 3 4- -a. 

\-8 x B-8 2 B 2 3 1 - 4} X B - 4> 2 B 2 

has been identified. Equations (12.3.4), (12.3.5), and (12.3.6) then become 

y, = 8 x y t _ x + 8 2 y t _ 2 + coqX ,_ 3 - m x X t _ 4 - co 2 X t _ 5 (12.3.8) 

N t =Y t — y t (12.3.9) 

a, = N t - - <t> 2 N t _2 (12.3.10) 

Thus, (12.3.8) can be used to generate y t from i = 6 onward and (12.3.10) to generate a t 
from r = 8 onward. The slight loss of information that results will not be important for a 
sufficiently long length of series. For example, since N = 296 for the gas furnace data, the 
loss of seven values at the beginning of the series is of little practical consequence. 

In the example above, we have assumed that b = 3. To estimate b, the values of 5, co, ([>. 
and 6, which minimize the conditional sum of squares, can be calculated for each value of 
b in the likely range and the overall minimum with respect to b, 8, co, <p, and 0 obtained. 

12.3.2 Nonlinear Estimation 

A nonlinear least-squares algorithm, analogous to that given for fitting the stochastic model 
in Section 7.2.4, can be used to obtain the least-squares estimates and their approximate 
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standard errors. The algorithm will behave well when the sum-of-squares function is 
roughly quadratic. However, the procedure can sometimes run into trouble, in particular if 
the parameter estimates are very highly correlated (if, for example, the model approaches 
singularity due to near-common factors in the factorizations of the operators), or, in some 
cases, if estimates are near a boundary of the permissible parameter space. In difficult cases, 
the estimation situation may be clarified by plotting sum-of-squares contours for selected 
two-dimensional sections of the parameter space. 

The nonlinear least-squares algorithm can be implemented as follows: At any stage 
of the iteration, and for some fixed value of the delay parameter b, let the best guesses 
available for the remaining parameters be denoted by 


P'o — (^1,0’ ■ > ( 'V,0 ’ ®0,0’ • • • > ®s,0> ■ • • > Qpfi’ 01,0’ > 0 g,o) 

Now let a, 0 denote that value of a, computed from the model, as in Section 12.3.1, for the 
guessed parameter values fi i} and denote the negative of the derivatives of a t with respect 
to the parameters as follows: 


AS) _ 

'- f 38 ; 



Am) _ 3a > 


A*) _ da < 


AO) _ da t 

Po 

iJ d *J 

Po 

g ’‘ 

Po 

hJ d6 h 


Po 


(12.3.11) 

Then a Taylor series expansion of a t = a t (fi) about parameter values p = fi {) can be rear¬ 
ranged in the form 


a t,o ~ X (f5 ' _ 8 uo) d u + Z ( ®/ _ °°i. o )d J? 

/=i j =o 

+ + a < 

g=l h =1 


(12.3.12) 


We proceed as in Section 7.2 to obtain adjustments 8, — 8, 0 , Wj — coj 0 , and so on, by fitting 
this linearized equation by standard linear least-squares. By adding the adjustments to the 
first guesses /J 0 , a set of second guesses can be formed and the procedure repeated until 
convergence is reached. 

The derivatives in (12.3.11) may be computed recursively. However, it seems simplest 
to work with a standard nonlinear least-squares computer program in which derivatives are 
determined numerically and an option is available of “constrained iteration” to prevent 
instability. It is then necessary only to program the computation of a t itself. 

The covariance matrix of the estimates may be obtained from the converged value of 
the matrix 


(X'pXpf'ol ~ cov[/3] 

as described in Section 7.2.2; in addition, the least-squares estimates p have been shown 
to have a multivariate normal asymptotic distribution (e.g.. Pierce, 1972a; Reinsel, 1979). 
If the delay b, which is an integer, needs to be estimated, the iteration may be run to 
convergence for a series of values of b and the value of b giving the minimum sum of 
squares selected. One special feature (see, for example. Pierce, 1972a) of the covariance 
matrix of the least-squares estimates p is that it will be approximately a block diag¬ 
onal matrix whose two blocks on the diagonal consist of the covariance matrices of the 
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parameters (S', w') = (<5 lf ... ,S r , cb 0 ,... ,cb s ) and($', O') = (0j,..., $ p , 6 X ,... , 6 q ), respec¬ 
tively. Thus, the parameter estimates of the transfer function part of the model are approx¬ 
imately uncorrelated with the estimates of the noise part of the model, which results from 
the assumed independence between the input X t and the white noise a, in the model. 

More exact sum-of-squares and exact likelihood function methods could also be em¬ 
ployed in the estimation of the transfer function-noise models, as in the case of the ARMA 
models discussed in Chapter 7 (see, e.g. Newbold, 1973). The state-space model Kalman 
filtering and innovations algorithm approach to the exact likelihood evaluation discussed 
in Section 7.4 could also be used. However, for moderate and large n and nonseasonal data, 
there will generally be little difference between the conditional and exact methods. 

Remark. Commercially available software packages such as SAS and SCA include al¬ 
gorithms for estimating the parameters in transfer function-noise models. The software 
package R can also be used for model fitting. In particular, the newly released package 
MTS for multivariate time series analysis that we will use in Chapter 14 has a function 
tfml () that fits a transfer function-noise model to a dataset with a single input variable X. 
A demonstration of this package is given in Section 12.4.1. A second function tfm2() fits 
a model with two input variables to the data. 


12.3.3 Use of Residuals for Diagnostic Checking 

Serious model inadequacy can usually be detected by examining 

1. The autocorrelation function r^(k) of the residuals a, = a t (b, <5, ®, ([>, 0 ) from the 
fitted model. 

2. Certain cross-correlation functions involving input and residuals: in particular, the 
cross-correlation function r aa (k) between prewhitened input a, and the residuals a,. 


Suppose, if necessary after suitable differencing, that the model can be written as 

y t = 6~\B)co(B)x t _ b + cj)~ x (B)0(B)a t 

= v(B)x, + y/(B)a t (12.3.13) 

Now, suppose that we select an incorrect model leading to residuals a 0t , where 

y, = + V{)(B)a {)t 

Then 


a 0 t = V'o '(£)[«(£) - v 0 (B)]x t 4 - 1 1 / 0 \B)i//(B)a t (12.3.14) 

Thus, it is apparent in general that if a wrong model is selected, the a 0t 's will be autocor- 
related and the c/ 0r A will be cross-correlated with the x,’s and hence with the a/s, which 
generate the x/s. 

Now consider what happens in two special cases: (1) when the transfer function model 
is correct but the noise model is incorrect, and (2) when the transfer function model is 
incorrect. 
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Transfer Function Model Correct, Noise Model Incorrect. If v Q (B) = u(B) but i// (| f B ) f 
i j/{B), then (12.3.14) becomes 

«0i = Vq l (B)y/(B)a, (12.3.15) 

Therefore, the a 0t ’s would not be cross-correlated with xfs or with afs. However, the 
a Ql process would be autocorrelated, and the form of the autocorrelation function could 
indicate appropriate modification of the noise structure, as discussed for univariate ARIMA 
models in Section 8.3. 

Transfer Function Model Incorrect. From (12.3.14) it is apparent that if the transfer 
function model were incorrect, not only would the a 0r ’s be cross-correlated with the x t ’s 
(and a r ’s), but also the a 0t ’s would be autocorrelated. This would be true even if the noise 
model were correct, for then (12.3.14) would become 

a 0 , = y/~\B)[v(B) - v 0 (B)]x t + a, (12.3.16) 

Whether or not the noise model was correct, a cross-correlation analysis could indicate the 
modifications needed in the transfer function model. This aspect is clarified by considering 
the model after prewhitening. If the output and the input are assumed to be transformed so 
that the input is white noise, then, as in (12.2.8), we may write the model as 

f, = v(B)a, + e, 

where f) r = 6~ 1 (B)cf> x (B)y l and t , = 6~ l (B)<p x (B)n t . Now, consider the quantities 


e 0r = Pt~ v o(B)a t 


Since e 0t = [v(B) — v 0 (B)]a t + e t , arguing as in Section 12.1.1, the cross-correlations be¬ 
tween the e 0 /s and the afs measure the discrepancy between the correct and incorrect 
impulse functions. Specifically, as in (12.2.11), 


v k ~ v ok 


Pae Q ( K >e 0 


k = 0,1,2,. 


(12.3.17) 


12.3.4 Specific Checks Applied to the Residuals 

In practice, we do not know the process parameters exactly but must apply our checks 
to the residuals a t computed after least-squares fitting. Even if the functional form of the 
fitted model were adequate, the parameter estimates would differ somewhat from the true 
values and the distribution of the autocorrelations of the residuals a t ’s would also differ 
to some extent from that of the autocorrelations of the a t ’s. Therefore, some caution is 
necessary in using the results of the previous sections to suggest the behavior of residual 
correlations. The brief discussion that follows is based in part on a more detailed study by 
Pierce (1972b). 

Autocorrelation Checks. Suppose that a transfer function-noise model having been fitted 
by least-squares and the residuals a ,’s calculated by substituting least-squares estimates 
for the parameters and the estimated autocorrelation function r ss (k) of these residuals is 
computed. Then, as we have seen 
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1. If the autocorrelation function r ss (k) shows marked correlation patterns, this suggests 
model inadequacy. 

2. If the cross-correlation checks do not indicate inadequacy of the transfer function 
model, the inadequacy is probably in the fitted noise model n t = i// () ( B)a []r 

In the latter case, identification of a subsidiary model 

a 0t = T(B)a t 

to represent the correlation of the residuals from the primary model can, in accordance with 
(12.3.15), indicate roughly the form 

n, = i f/ 0 (B)T(B)a t 

to take for the modified noise model. However, in making assessments of whether an 
apparent discrepancy of estimated autocorrelations from zero is, or is not, likely to point 
to a nonzero theoretical value, certain facts must be borne in mind analogous to those 
discussed in Section 8.2.1. 

Suppose that after allowing for starting values, m = n — u — p values of the a t ’s are 
actually available for this computation. Then if the model was correct in functional form 
and the true parameter values were substituted, the residuals would be white noise and the 
estimated autocorrelations would be distributed mutually independently about zero with 
variance 1/m. When estimates are substituted for the parameter values, the distributional 
properties of the estimated autocorrelations at low lags are affected. In particular, the 
variance of these estimated low-lag autocorrelations can be considerably less than 1 /m, 
and the values can be highly correlated. Thus, with k small, comparison of an estimated 
autocorrelation r ss (k) with a “standard error’’ 1 / \[m could greatly underestimate its sig¬ 
nificance, Also, ripples in the estimated autocorrelation function at low lags can arise simply 
because of the high induced correlation between these estimates. If the amplitude of such 
low-lag ripples is small compared with 1 / s/m, they could have arisen by chance alone and 
need not be indicative of some real pattern in the theoretical autocorrelations. 

A helpful overall check, which takes account of these distributional effects produced by 
fitting, is as follows. Consider the first K estimated autocorrelations r aa ( 1),..., r M (K) and 
let K be taken sufficiently large so that if the model is written as y t = u(B)x t + t//( B)a t , 
the weights i //j can be expected to be negligible for j > K. Then if the functional form of 
the model is adequate, the quantity 


K 

Q = m Yj r \& {k) (12.3.18) 

k=\ 

is approximately distributed as x 1 with K — p — q degrees of freedom. Note that the 
degrees of freedom in x 2 depend on the number of parameters in the noise model but not 
on the number of parameters in the transfer function model. By referring Q to a table of 
percentage points of / 2 , we can obtain an approximate test of the hypothesis of model 
adequacy. However, in practice, the modified statistic 

K 

Q = m(m + 2) ^(m - k)~ l rl & {k) 
k= 1 


(12.3.18a) 
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analogous to (8.2.3) of Section 8.2.2 for the ARIMA model, would be recommended instead 
of (12.3.18) because Q provides a closer approximation to the chi-squared distribution than 
Q under the null hypothesis of model adequacy. 

Cross-Correlation Check. As we have seen in Section 12.3.3, 

1. A pattern of markedly nonzero cross-correlations r xS (k) suggests inadequacy of the 
transfer function model. 

2. A somewhat different cross-correlation analysis can suggest the type of modification 
needed in the transfer function model. Specifically, if the fitted transfer function 
is v 0 (B) and we consider the cross-correlations between the quantities e 0t = ft, — 
v 0 (B)a t and a t , rough estimates of the discrepancies v k — v ok are given by 

r„ e ( k)s e 
ae o v ' £ o 


Suppose that the model were of the correct functional form and true parameter values 
had been substituted. The residuals would be white noise uncorrelated with the x r ’s and, 
using (12.1.11), the variance of the r xa (k) for an effective length of series m would be 
approximately 1/m. However, unlike the autocorrelations r aa (k). these cross-correlations 
will not be approximately uncorrelated. In general, if the x t ’s are autocorrelated, so are the 
cross-correlations r xa (k). In fact, as has been seen in (12.1.12), on the assumption that the 
x t ’s and the a t 's have no cross-correlation, the correlation coefficient between r xa (k) and 
r xa (k + I) is 

P[r xa (k),r xa (k + /)] c Pxx (l) (12.3.19) 

That is, approximately, the cross-correlations have the same autocorrelation function as does 
the original input series x t . Thus, when the x r ’s are autocorrelated, a perfectly adequate 
transfer function model will give rise to estimated cross-correlations r xh {k), which, although 
small in magnitude, may show pronounced patterns. This effect is eliminated if the check 
is made by computing cross-correlations r aS (k) with the prewhitened input a t . 

As with the autocorrelations, when estimates are substituted for parameter values, the 
distributional properties of the estimated cross-correlations are affected. However, a rough 
overall test of the hypothesis of model adequacy, similar to the autocorrelation test, can be 
obtained based on the magnitudes of the estimated cross-correlations. To employ the check, 
the cross-correlations r nil (k) for k = 0,1,2,..., K between the input a t in prewhitened form 
and the residuals a, are estimated, and K is chosen sufficiently large so that the weights Vj 
and i f/j in (12.3.13) can be expected to be negligible for j > K. The effects resulting from 
the use of estimated parameters in calculating residuals are, as before, principally confined 
to cross-correlations of low order whose variances are considerably less than m _1 and that 
may be highly correlated even when the input is white noise. 

For an overall test. Pierce (1972b) showed that 

K 

S = m V r 2 ,(k) 
k =0 


(12.3.20) 
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is approximately distributed as / 2 with K + l — (r + s + 1) degrees of freedom, where 
(r + s + 1) is the number of parameters fitted in the transfer function model. Note that the 
number of degrees of freedom is independent of the number of parameters fitted in the 
noise model. Based on studies of the behavior of the Q statistic discussed in Chapter 8, 
the modified statistic, S = m(m + 2) 2jk=o^ m ~ Cfl®’ might be suggested for use in 
practice because it may more accurately approximate the / 2 distribution under the null 
model, although detailed investigations of its performance have not been made (however, 
see empirical results in Poskitt and Tremayne, 1981). 


12.4 SOME EXAMPLES OF FITTING AND CHECKING TRANSFER 
FUNCTION MODELS 

12.4.1 Fitting and Checking of the Gas Furnace Model 

We now illustrate the approach described in Section 12.2 to the fitting of the model 

co q -co 1 B-co 2 B 2 1 

Y t — -- —yC + -“ 6 ?, 

l-8 x B-8 2 B 2 3 l-^B-^B 2 

which was identified for the gas furnace data in Sections 12.2.2 and 12.2.3. 

Nonlinear Estimation. Using the initial estimates cb 0 = —0.53, cbj = 0.33, cb 2 = 0.51, = 

0.57, 8 2 = 0.02, </>! = 1.54, and </> 2 = —0.64 derived in Sections 12.2.2 and 12.2.3 with the 
conditional least-squares algorithm described in Section 12.3.2, least-squares values, to 
two decimals, were achieved in four iterations. However, to test whether the results would 
converge in much less favorable circumstances, Table 12.3 shows the iterations produced 
with all starting values taken to be either +0.1 or —0.1. The fact that, even then, convergence 
was achieved in 10 iterations with as many as seven parameters in the model is encouraging. 

The last line in Table 12.3 shows the rough preliminary estimates obtained at the 
identification stage in Sections 12.2.2 and 12.2.3. It is seen that for this example, they are in 
close agreement with the least-squares estimates given on the previous line. Thus, the final 
fitted transfer function model is 

(1 -0.57B-0.01B 2 )Y, = -(0.53 + 0.37B + 0.51B 2 )X f _ 3 (12.4.1) 

(±0.21)(±0.14) (±0.08)(±0.15)(±0.16) 

and the fitted noise model is 


(1 - 1.53B + 0.63 B 2 )N, = a, (12.4.2) 

(±0.05)(±0.05) 

with <7 2 = 0.0561, where the limits in parentheses are the ±1 standard error limits obtained 
from the nonlinear least-squares estimation procedure. 

Diagnostic Checking. Before accepting the model above as an adequate representation of 
the system, autocorrelation and cross-correlation checks should be applied, as described in 
Section 12.3.4. The first 36 lags of the residual autocorrelations are given in Table 12.4(a) 
and plotted in Figure 12.6(a), together with their approximate two standard error limits 
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TABLE 12.3 Convergence of Nonlinear Least-Squares Fit of Gas Furnace Data 


Iteration 

% 

®i 

®2 

< 5 , 

S 2 


02 

Sum of 
Squares 

0 

0.10 

-0.10 

-0.10 

0.10 

0.10 

0.10 

0.10 

13,601.00 

1 

-0.46 

0.63 

0.60 

0.14 

0.27 

1.33 

-0.27 

273.10 

2 

-0.52 

0.45 

0.31 

0.40 

0.52 

1.37 

-0.43 

92.50 

3 

-0.63 

0.60 

0.01 

0.12 

0.73 

1.70 

-0.76 

31.80 

4 

-0.54 

0.50 

0.29 

0.24 

0.42 

1.70 

-0.81 

19.70 

5 

-0.50 

0.31 

0.51 

0.63 

0.09 

1.56 

-0.68 

16.84 

6 

-0.53 

0.38 

0.53 

0.54 

0.01 

1.54 

-0.64 

16.60 

7 

-0.53 

0.37 

0.51 

0.56 

0.01 

1.53 

-0.63 

16.60 

8 

-0.53 

0.37 

0.51 

0.56 

0.01 

1.53 

-0.63 

16.60 

9 

-0.53 

0.37 

0.51 

0.57 

0.01 

1.53 

-0.63 

16.60 

Preliminary 

estimates 

-0.53 

0.33 

0.51 

0.57 

0.02 

1.54 

-0.64 



±2 /\Jm ~ 0.12 (m = 289) under the assumption that the model is adequate. There seems 
to be no evidence of model inadequacy from the behavior of individual autocorrelations. 
This is confirmed by calculating the Q criterion in (12.3.18a), which is 

36 

Q = (289)(291) £(289 - k)~ l r 2 M (k) = 43.8 

k= 1 

Comparison of Q with the / 2 table for K — p — q = 36 — 2 — 0 = 34 degrees of freedom 
provides no grounds for questioning model adequacy. 

The first 36 lags of the cross-correlation function r xS (k) between the input X t and the 
residuals a, are given Table 12.4(b) and shown in Figure 12.6(b), together with their approx¬ 
imate two standard error limits ±2/ \[m. It is seen that although the cross-correlations r xS (k) 
do not exceed their two standard error limits, they are themselves highly autocorrelated. 
This is to be expected because as indicated by (12.3.19), the estimated cross-correlations 
follow the same stochastic process as does the input X t , and as we have already seen, for 
this example the input was highly autocorrelated. 

The corresponding cross-correlations between the prewhitened input a t and the residuals 
a t are given in Table 12.4(c) and shown in Figure 12.6(c). The S criterion yields 

35 

s = (289X291) £(289 - k)~ l r 2 Jk) = 32.1 

k =0 

Comparison of S with the X 2 table forAi+l-(/- + s+ l) = 36-5 = 31 degrees of 
freedom again provides no evidence that the model is inadequate. 

Parameter Estimation Using R. We will now use the R software to fit the model employed 
in (12.4.1) and (12.4.2) to the gas furnace data. The parameter estimation can be performed 
using the function tfml () in the MTS package developed for multivariate time series anal¬ 
ysis. The arguments of this function are tfm1(Y, X,orderX=c(r,s,b),orderN=c(p,d,q)). 
The function call and the resulting output are shown below: 






TABLE 12.4 Estimated Autocorrelation and Cross-Correlation Functions of Residuals from Fitted Gas Furnace Model 















Upper 
Bound to 
Standard 

Lag k 






(a) Autocorrelation r M (k) 






Error 

1-12 

0.02 

0.06 

-0.07 

-0.05 

-0.05 

0.12 

0.03 

0.03 

-0.08 

0.05 

0.02 

0.10 

±0.06 

13-24 

-0.04 

0.05 

-0.09 

-0.01 

-0.08 

0.00 

-0.12 

0.00 

-0.01 

0.08 

0.02 

-0.01 

±0.06 

25-36 

0.04 

-0.02 

0.02 

0.09 

-0.12 

0.06 

-0.03 

-0.06 

0.11 

0.02 

0.03 

0.06 

±0.06 





(b) 

r xi (k) between the input and the output residuals 





0-11 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

-0.01 

-0.02 

-0.03 

-0.05 

-0.06 

-0.05 

±0.06 

12-23 

-0.03 

-0.03 

-0.03 

-0.07 

-0.10 

-0.12 

-0.12 

-0.10 

-0.04 

-0.01 

-0.01 

-0.02 

±0.06 

24-35 

-0.03 

-0.04 

-0.04 

-0.02 

-0.01 

0.02 

0.04 

0.05 

0.06 

0.07 

0.07 

0.06 

±0.06 


0-11 

-0.06 

0.03 

-0.01 

(c) r a6 

0.00 

(k) between the prewhitened 

0.01 0.01 

input and the output residuals 

0.01 -0.04 0.02 

0.07 

-0.03 

-0.02 

±0.06 

12-23 

-0.03 

-0.11 

0.02 

0.04 

0.04 

0.01 

0.01 

-0.15 

-0.03 

-0.07 

-0.08 

0.02 

±0.06 

24-35 

-0.01 

0.02 

0.05 

-0.07 

0.00 

0.04 

-0.15 

0.04 

0.03 

-0.02 

0.00 

-0.03 

±0.06 
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(a) 



2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 

Lag 



1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 

Lag 



1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 

Lag 


FIGURE 12.6 (a) Estimated autocorrelations of the residuals r aa (k) from the fitted gas furnace 

model, (b) estimated cross-correlations r xa (k) between the input and the output residuals r xa (k), and 
(c) estimated cross-correlations r aS (k) between the prewhitened input and the output residuals. 


> library(MTS) 

> ml=tfml(Y,X,orderX=c(2,2,3),orderN=c(2,0,0)) 

Model Output: 

Delay: 3 

Transfer function coefficients & s.e.: 

in the order: constant, omega, and delta: 132 
[ , 1] [, 2 ] [, 3] [,4] [, 5] [ , 6] 

v 53.371 -0.5302 -0.371 -0.511 0.565 -0.0119 
se.v 0.142 0.0745 0.146 0.149 0.200 0.1415 

ARMA order: [1] 200 
ARMA coefficients & s.e.: 

[, 1 ] [, 2 ] 
coef.arma 1.5315 -0.6321 
se.arma 0.0472 0.0502 

> names(ml) % check contents of output 

[1] "estimate" "sigma2" "residuals" "varcoef" "Nt" 

> ml$sigma2 

[1] 0.0576 % residual variance 

> acf(ml$residuals) % acf of the residuals 

> ccf(ml$residuals,X) % cross-correlation 
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between input series and residuals 
> ccf(ml$residuals,xprev) % cross-correlation 

between prewhitened input and residuals 

Using the output from R and allowing for sign differences in the definition of w(B), the 
estimated transfer function-noise model is 

„ -(0.53 + 0.37R + 0.515 2 ;) 1 

Y. — --- X.-i +- a. 

1 -0.57R + 0.01R 2 1 - 1.53R + 0.63R 2 

We see that the parameter estimates for the transfer function and noise models are nearly 
identical to those shown in (12.4.1) and (12.4.2). The estimate of the residual variance is 
0.0576, which is also close to the value 0.0561 quoted in the text. In addition, the residual 
autocorrelations, the cross-correlations between the input X, and the residuals, and the 
cross-correlations between the prewhitened input and the residuals (not shown) were small 
and close to those displayed in Figure 12.6 although some minor differences were seen in 
the patterns. 

Step and Impulse Responses. The estimate S 2 = 0.01 in (12.4.1) is very small compared 
with its standard error ±0.14, and the parameter S 2 can in fact be omitted from the model 
without affecting the estimates of the remaining parameters to the accuracy considered. 
The final form of the combined transfer function-noise model for the gas furnace data is 

-(0.53 + 0.37B + 0.51.B 2 ) 1 

Y. — - X.-i ±- a. 

1 — 0.57 B r3 1 - 1.53R + 0.63R 2 

The step and impulse response functions corresponding to the transfer function model 

(1 - 0.57 B)Y, = -(0.53 + 0.37 B + 0.51 B 2 )X,_ 3 

are given in Figure 12.7. Using (11.2.5), the steady-state gain of the coded data is 

-(0.53 + 0.37 + 0.51) 

g = -- = -3.3 

1 - 0.57 

The results agree very closely with those obtained by cross-spectral analysis (Jenkins and 
Watts, 1968). 


Choice of Sampling Interval. When a choice is available, the sampling interval should 
be taken as fairly short compared with the time constants expected for the system. When 
in doubt, the analysis can be repeated with several trial sampling intervals. In the choice 
of sampling interval, it is the noise at the output that is important, and its variance should 
approach a minimum value as the interval is shortened. Thus, in the gas furnace example 
that we have used for illustration, a pen recorder was used to provide a continuous record 
of input and output. The discrete data that we have actually analyzed were obtained by 
reading off values from this continuous record at points separated by 9-second intervals. 
This interval was chosen because inspection of the traces shown in Figure 12.1 suggested 
that it ought to be adequate to allow all the variation (apart from slight pen chatter) that 
occurred in input and output to be taken account of. The use of this kind of common 
sense is usually a reliable guide in choosing the interval. The estimated mean square error 
for the gas furnace data, obtained by dividing £ (Y f — Y t ) 2 by the appropriate number of 
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a 


j -► 



j -► 

2 4 6 8 10 12 



FIGURE 12.7 Impulse and step responses for transfer function model (1 — 0.57 B)Y t = —(0.53 + 
0.37 B + 0.5lB 2 )X t _ 3 fitted to coded gas furnace data. 

TABLE 12.5 Mean Square Error at the Output for Various Choices of the Sampling Interval 
for Gas Furnace Data 


Interval (Seconds) 



9 

18 

27 

36 

45 

54 

72 

Number of data points N 

296 

148 

98 

74 

59 

49 

37 

MS error 

0.71 

0.78 

0.74 

0.95 

0.97 

1.56 

7.11 


degrees of freedom, is shown for various time intervals in Table 12.5. These values are also 
plotted in Figure 12.8. Little change in mean square error occurs until the interval is almost 
40 seconds, when a very rapid rise occurs. There is little difference in the mean square 
error, or indeed the plotted step response, for the 9-, 18-, and 27-second intervals, but a 
considerable change occurs when the 36-second interval is used. It will be seen that the 
9-second interval we have used in this example is, in fact, conservative. 

12.4.2 Simulated Example with Two Inputs 

The fitting of models involving more than one input series involves no difficulty in principle, 
except for the increase in the number of parameters that has to be handled. For example, 
for two inputs we can write the model as 

yt = S~\B)w l (B)x ht _ b i + 8~\B)co 2 {B)x 2 ,_ bi + n, 
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FIGURE 12.8 Mean square error at the output for various choices of sampling interval. 


with 


n t = <p l (B)0(B)a t 


where y t = V d Y t ,x lr = S7 d X lt , x 2l = V d X 2t , and n t = S7 d N t are stationary processes. 
To compute the a t ’s, we first calculate for specific values of the parameters b ] , 6 l5 fflj, 


y\, t = 8 \ 1 (B)co l (B)x u _ h 

(12.4.3) 

and for specific values of b 2 , S 2 , ® 2 , 


yi.t = S2 1 (B)a> 2 (B)x 2 j- b2 

(12.4.4) 

Then the noise n, can be calculated from 


n t = y t - y\.t - yi, t 

(12.4.5) 

and finally, a, from 


a t = 6~ l (B)(j)(B)n t 

(12.4.6) 


Simulated Example. It is clear that even simple situations can lead to the estimation of 
a large number of parameters. The example below, with two input variables and delayed 
first-order models, has eight unknown parameters. To illustrate the behavior of the iterative 
nonlinear least-squares procedure described in Section 12.3.2 when used to obtain estimates 
of the parameters in such models, an experiment was performed using manufactured data, 
details of which are given in Box et al. (1967b). The data were generated from the model 
written in V form as 


Y,=P + g i 


1 +*7iV 
1+liV' 


X 


1 , 1-1 + &2J 


1 + I? 2 V 

+ | 2 V' 


X 


2,1-1 


+ 


1 


1 —<M 


(12.4.7) 
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Conditions 0| C | B \ A \ D\ B\ D \ A \ C\B \ A\C \P\0 
0 5 10 15 20 25 30 35 40 45 50 55 60 

Time in minutes 


FIGURE 12.9 Data for simulated two-input example (Series K). 


with 13 = 60, = 13.0, rft = -0.6, = 4.0, g 2 = -5.5, r] 2 = -0.6, | 2 = 4.0, 4>\ = 0.5, and 

= 9.0. The input variables X ] and X 2 were changed according to a randomized 2 2 
factorial design replicated three times. Each input condition was supposed to be held 
fixed for 5 minutes and output observations taken every minute. The data are plotted in 
Figure 12.9 and appear as Series K in the Collection of Time Series section in Part Five. 

The constrained iterative nonlinear least-squares program, described in Chapter 7, was 
used to obtain the least-squares estimates, so that it was only necessary to set up the 
calculation of the a/ s. Thus, for specified values of the parameters g |, g 2 , ifq, | 2 , >1\ ■ and i/ 2 , 
the values y { t and y 2 , can be obtained from 

(1 + V)y lj( = gj(l +r, l S7)X u _ l 
(1 + ^ 2 X)y 2t = g 2 (l + r\ 2 ^)X lt _ x 

and can be used to calculate 


N t = Y t- yu - 

Finally, for a specified value of (f> l , a, can be calculated from 


a, = N , ~ 4>iN t _i 

It was assumed that the process inputs had been maintained at their center conditions for 
some time before the start of the experiment, so that y l t , y 21 , and N t may be computed 
from i = 0 onward and a, from t = 1. 

Two runs were made of the nonlinear least-squares procedure using two different sets 
of initial values. In the first, the parameters were chosen as representing what a per¬ 
son reasonably familiar with the process might guess for initial values. In the second, 
the starting value for (3 was chosen to be the sample mean Y of all observations and all 
other starting values were set equal to 0.1. Thus, the second run represents a much more 
extreme situation than would normally arise in practice. Convergence with the first set of 
initial values occurred after five iterations, while convergence with the second set occurred 
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after nine iterations. These results suggest that in realistic circumstances, multiple inputs 
can be handled without serious estimation difficulties. 


12.5 FORECASTING WITH TRANSFER FUNCTION MODELS USING 
LEADING INDICATORS 

Frequently, forecasts of a time series Y t , Y t _ j,... may be considerably improved by using 
information coming from some associated series X t ,X t _ x , .... This is particularly true if 
changes in Y tend to be anticipated by changes in X , in which case economists call X a 
“leading indicator” for Y. 

To obtain an optimal forecast using information from both series Y t and X t , we first 
build a transfer function-noise model connecting the series Y t and X, in the manner already 
outlined. Suppose, using previous notations, that an adequate model is 

Y t = 8~ 1 iB)m(B)X,_ b + cp- 1 (B)6(B)a, b> 0 (12.5.1) 

In general, the noise component of this model, which is assumed statistically independent 
of the input X t , is nonstationary with q>(B) = 4>(B)S7 d , so that if y, = W d Y t and x t = X d X n 

y, = 8-\B)co(B)X t _ b + ( />- 1 (B)0(B)a t 

Also, we will assume that an adequate stochastic model for the input or leading series X t is 

X, = <p-\B)0 x (B)a t (12.5.2) 

so that with cp x (B) = <fi x (B)V d , 

x t = (j)~ l {B)0 x {B)a, 

12.5.1 Minimum Mean Square Error Forecast 

Now (12.5.1) may be written as 

Y t = u(B)a t + i //(B)a t (12.5.3) 

with the a t ’s and the a t 's statistically independent white noise, and 
v(B) = 8~ l (B)co(B)B b cp- l (B)0 x (B) 

Arguing as in Section 5.1.1, suppose that the forecast Y t (l) of Y l+I made at origin t is of the 
form 

00 00 

Y t 0) = Yj v i+j a t-j + Z ^+j a t-j 
j =0 j =0 
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Then 

/-l 

Y t+l - Y t (l) = Yj( v , a t+l-, + ¥i a t+i-i) 
i =0 
oo 

+ Yj [(v '+j ~ v l+? a >-j + (Vl+j - V? +j )a,-j] 


E[(Y t+l - Y t (I)) 2 ] = (v 2 + v 2 + - + v 2 ,,)^ + (1 + v/ 2 + - + 

00 

+ Z [(v '+; - W + (Vi+j - vl/vp 

j =o 

which is minimized only if u® = v l+ j and i p® + = y/ l+ j for y = 0,1,2 .... Thus, the min¬ 
imum mean square error forecast Y t (l) of Y r+ , at origin t is given by the conditional 
expectation of Y t+l at time t, based on the past history of information on both series Y t and 
X, through time t. Theoretically, this expectation is conditional on knowledge of the series 
from the infinite past up to the present origin t. As in Chapter 5, such results are of practical 
use because, usually, the forecasts depend appreciably only on recent past values of the 
series X, and Y t . 


Computation of the Forecast. Now (12.5.1) may be written as 

<p(B)S(B)Y, = <p(B)co(B)X,_ b + 8(B)0(B)a t 

which we will write as 

S*(B)Y, = co*(B)X t _ b + 0*(B)a, 

Then, using square brackets to denote conditional expectations at time t, and writing 
p* = p + d, we have for the lead / forecast 

Y t d) = [Y t+I ] = + - + 8; t+r [Y t+l _ p ,_ r ] + co* 0 [X t+l _ b ] 

— ■■■ — co pt+s [X r+ i_ b _ p *_ s ] + [u f+ /] — [u r+ /_il 
- 9* +r [a t+l _ q _ r \ (12.5.4) 

where 



Y t + j 

j <0 

Y t (j ) 

j> 0 

X t +j 

O 

VI 

x t U) 

V 

o 

a t+j 

j <o 

0 

'— 

V 

o 


(12.5.5) 
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and a t is calculated from (12.5.1), which if b > 1 is equivalent to 

°t=Y,~ Y t - id) 

Thus, by appropriate substitutions, the minimum mean square error forecast is readily 
computed directly using (12.5.4) and (12.5.5). The forecasts X t (J) are obtained in the usual 
way (see Section 5.2) utilizing the univariate ARIMA model (12.5.2) for the input series 
*t- 

It is important to note that the conditional expectations in (12.5.4) and (12.5.5) are taken 
with respect to values in both series Y t and X t through time t. but because of the assumed 
independence between input X t and noise N t in (12.5.1), it follows in particular that we 
will have 


X t (j) = E[X t+j \X„ X t _i,... ,Y t ,Y t _ x , ...] = E[X r+j \X„ X t _ u ...] 

That is, given the past values of the input series X t , the optimal forecasts of its future 
values depend only on the past X ’s and cannot be improved by the additional knowledge of 
the past F’s; hence, the optimal values X,(j) can be obtained directly from the univariate 
model (12.5.2). 

Variance of the Forecast Error. The Vj weights and the i// ; weights of (12.5.3) may be 
obtained explicitly by equating coefficients in 

8(B)cp x (B)v(B) = co(B)9 x (B)B b 

and in 

cp(B) ¥ (B) = 9(B) 

The variance of the lead / forecast error is then given by 

/-l /-l 

V(l) = E[(Y t+l - f r (/)) 2 ] = <7 2 J v] + C7 2 a Yj (12 - 5 - 6 ) 

j=b j =0 

Forecasts as a Weighted Aggregate of Previous Observations. For any given example, it 
is instructive to consider precisely how the forecasts of future values of the series Y t utilize 
the previous values of the X, and Y t series. We have seen in Section 5.3.3 how the forecasts 
may be written as linear aggregates of previous values of the series. Thus, for forecasts of 
the input or leading indicator, we could write 

00 

X t (l) = 2 A, +w (12.5.7) 

7=1 

The weights = jzj arise when the model (12.5.2) is written in the infinite autoregressive 
form 

a, = 9-\B)cp x (B)X t = X t - K X X t _ x - K 2 X t _ 2 - - 
and may thus be obtained by explicitly equating coefficients in 
cp x (B) = (\-7t x B-K 2 B 2 --)9 x (B) 
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Also, using (5.3.9), 


= + (12.5.8) 

h= 1 

In a similar way, we can write the transfer function model (12.5.1) in the form 

00 00 

<>t = Y, ~ X PjYt-j ~ X QjX,-j (12-5.9) 

7=1 7=0 

It should be noted that if the transfer function between the input or leading indicator series 
X t and the output Y t is such that b > 0, then v y = 0 for j < b, and so Q 0 , Q l ,.... Q b _ ] in 
(12.5.9) will also be zero. 

Now (12.5.9) may be written as 

Comparison with (12.5.1) shows that the Pj and Q j weights may be obtained by equating 
coefficients in the expressions 


0(B) 



= <p(B) 


0(B)S(B) 



= cp(B)w(B)B b 


On substituting t + l for t in (12.5.9), and taking conditional expectations at origin f, we 
have the lead / forecast in the form 

00 00 

m = X Wr+i-^ + X Qj&t+i-ji (I.2.5. 1 . 0 ) 

7=1 7=0 


Now the lead 1 forecast is Y t (\) = Y.°°_ 1 PjY t+l _j + Qo,[X t+ ]} + which for 

b > 0 becomes 

00 00 

Y t (l)=J j P j Y t+l - J + J jQj X t+l - J 
7=1 7=1 

Also, the quantities in square brackets in (12.5.10) are either known values of the X t and 
Y t series or forecasts that are linear functions of these known values. 

Thus, we can write the lead / forecast in terms of the values of the series that have 
already occurred at time t in the form 

00 00 

w = ^ l V) + Ee;\ 1 

7=1 7=1 


(12.5.11) 
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where, for b > 0, the coefficients / J ' /j and may be computed recursively as follows: 


P ) 


0 ) 


Qf = Qj 

i -1 




d-h) 


(12.5.12) 


h=l 
I -1 


Q“' = a I+ i-^'L{p„o"- u +o l y‘- 1 "} 

h= 1 


12.5.2 Forecast of C0 2 Output from Gas Furnace 

For illustration, consider the gas furnace data shown in Figure 12.1. For this example, the 
fitted model (see Section 12.4.1) was 

-(0.53 + 0.375 + 0.515 2 , 1 

1-0.57 B 3 1 - 1.53J5 + 0.63J5 2 

and (1 - 1.97U + 1.37 B 2 - 0.34 B 3 )X t = a t . The forecast function, written in the form 
(12.5.4), is thus 

Y t (l) = [Y t+I i = 2.1 [F t+; _ x ] - 1.5021 [Y t+l _ 2 ] +0.359l[r f+/ _ 3 ] 

- 0.53[X t+l _ 3 ] + 0.4409[X r+/ _ 4 ] - 0.2778 [X t+ ,_ 5 ] 

+ 0.5472[X f+/ _ 6 ]-0.3213[X f+/ _ 7 ] 

+ [a t+l ] - 0.57[o r+/ _ 1 ] 

Figure 12.10 shows the forecasts for lead times / = 1,2,..., 12 made at origin t = 206. The 
Kj, Pj , and Qj weights for the model are given in Table 12.6. 

Figure 12.10 shows the weights pj 5 * and appropriate to the lead 5 forecast. The 
weights Vj and i//,- of (12.5.3) are listed in Table 12.7. Using estimates a 2 a = 0.0353 and a 2 a = 
0.0561, obtained in Sections 12.2.2 and 12.4.1, respectively, (12.5.6) may be employed 
to obtain variances of the forecast errors and the 50 and 95% probability limits shown in 
Figure 12.10. 

To illustrate the advantages of using an input or leading indicator series X t in forecasting, 
assume that only the Y t series is available. The usual identification and fitting procedure 


TABLE 12.6 Kj, Pj, and Qj Weights for Gas Furnace Model 


i 


P J 

Qj 

i 

K i 

Pj 

Qj 

l 

1.97 

1.53 

0 

7 

0 

0 

-0.07 

2 

-1.37 

-0.63 

0 

8 

0 

0 

-0.04 

3 

0.34 

0 

-0.53 

9 

0 

0 

-0.02 

4 

0 

0 

0.14 

10 

0 

0 

-0.01 

5 

0 

0 

-0.20 

11 

0 

0 

-0.01 

6 

0 

0 

0.43 
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FIGURE 12.10 Forecast of CO-, output from a gas furnace using input and output series. 


TABLE 12.7 v, and i// ( Weights for Gas Furnace Model 


i 

v i 

V, 

i 

v , 

V, 

0 

0 

1 

6 

-5.33 

0.89 

1 

0 

1.53 

7 

-6.51 

0.62 

2 

0 

1.71 

8 

-6.89 

0.39 

3 

-0.53 

1.65 

9 

-6.57 

0.20 

4 

-1.72 

1.45 

10 

-5.77 

0.06 

5 

-3.55 

1.18 

11 

-4.73 

-0.03 


applied to this series indicated that it is well described by an ARMA(4, 2) process, 

(1 - 2.42 B + 2.388 B 2 - 1.1685 3 + 0.23 B 4 )Y t = (1 - 0.315 + 0.475 2 )e t 

with a 2 = 0.1081. Table 12.8 shows estimated standard deviations of forecast errors made 

£ 

with and without the leading indicator series X t . As might be expected, for short lead times 
use of the leading indicator can produce forecasts of considerably greater accuracy. 

Univariate Modeling Check. To further confirm the univariate modeling results for the 
series Y v we can use results from Appendix A4.3 to obtain the nature of the univariate 
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TABLE 12.8 Estimated Standard Deviations of Forecast Errors Made With and Without the 
Leading Indicator for Gas Furnace Data 


/ 

With 

Leading 

Indicator 

Without 

Leading 

Indicator 

/ 

With 

Leading 

Indicator 

Without 

Leading 

Indicator 

1 

0.23 

0.33 

7 

1.52 

2.74 

2 

0.43 

0.77 

8 

1.96 

2.86 

3 

0.59 

1.30 

9 

2.35 

2.95 

4 

0.72 

1.82 

10 

2.65 

3.01 

5 

0.86 

2.24 

11 

2.87 

3.05 

6 

1.12 

2.54 

12 

3.00 

3.08 


ARIMA model for Y t that is implied by the transfer function-noise model between Y t and 
X t and the univariate AR(3) model for X t . These models imply that 

(1 - 0.575)(1 - 1.535 + 0.63 B 2 )Y t 
= -(0.53 + 0.375 + 0.515 2 )(1 - 1.535 + 0.63 B 2 )X t _ 3 
+ (1 — 0.51 B)a t (12.5.13) 


But since 


cp x {B) = 1 - 1.91 B + 131B 2 - 0.345 3 ~ (1 - 1.465 + 0.605 2 )(1 - 0.525) 

in the AR(3) model for X t , the right-hand side of (12.5.13) reduces approximately to 
—(0.53 + 0.375 + 0.515 2 )(1 — 0.52 B)~ l a t _ 2 + (1 — 0.57 B)a t , and hence we obtain 

(1 - 0.5251(1 - 0.5751(1 - 1.535 + 0.63 B 2 )Y, 

= -(0.53 + 0.375 4- 0.515 2 )a t _ 3 + (1 - 0.525)(1 - 0.57 B)a, 

The results of Appendix A4.3 imply that the right-hand side of this last equation has an 
MA(2) model representation as (1 - 0 : B - 0 2 B 2 )e t , and the nonzero autocovariances of 
the MA(2) are determined from the right-hand side expression above to be 

A 0 = 0.1516 Aj = -0.0657 /l 2 = 0.0262 

Hence, the implied univariate model for Y t would be ARMA(4, 2), with approximate 
AR operator equal to (1 — 2.625 + 2.595 2 — 1.145 3 + 0.195 4 ), and from methods of 
Appendix A6.2, the MA(2) operator would be (1 — 0.445 + 0.215 2 ), with a 2 = 0.1220; 
that is, the univariate model for Y t would be 

(1 - 2.625 + 2.595 2 - 1.145 3 + 0.195 4 )T, = (1 - 0.445 + 0.215 2 )e, 

This model result is in good agreement with the univariate model actually identified and 
fitted to the series Y t , which gives an additional check and provides further support to the 
transfer function-noise model that has been specified for the gas furnace data. 
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FIGURE 12.11 Sales data with leading indicator. 

12.5.3 Forecast of Nonstationary Sales Data Using a Leading Indicator 

As a second illustration, consider the data on sales Y t in relation to a leading indicator X t , 
plotted in Figure 12.11 and listed as Series M in the Collection of Time Series section in 
Part Five. The data are typical of that arising in business forecasting and are well fitted by 
the nonstationary model 3 

Tr = 0.035 + 3 + (1 - 0.54 B)a, 

x t = (1 — 0.32 B)a t 

with y, and x t first differences of the series. The forecast function, in the form (12.54), is 
then 

Y t (l) = [Y t+I ] = 1.72 [Y t+l _,] - 0.72[Y, +; _ 2 ] + 0.0098 + 4.82[X f+/ _ 3 ] 

— 4.82[3f r+; _ 4 ] + [a l+l ] — 1.26[o r+/ _ 1 ] 

+ 0.3888[a r+; _ 2 ] 

Figure 12.12 shows the forecasts for lead times / = 1,2,..., 12 made at origin t = 89. 
The weights v ; - and y/j are given in Table 12.9. 

Using the estimates = 0.0676 and a 2 = 0.0484, obtained in fitting the above 
model, the variance of the forecast error may be found from (12.5.6). In particular, 
V(l) = alY'r^y/j for / = 1,2, and 3 in this specific case (note the delay of b = 3 in 
the transfer function model). The 50 and 95% probability limits are shown in Figure 12.12. 
It will be seen that in this particular example, the use of the leading indicator allows very 
accurate forecasts to be obtained for lead times I = 1,2, and 3. 

The Kj, Pj, and Qj weights for this model are given in Table 12.10. The weights /r ' 

and (2^ 5> appropriate to the lead 5 forecast are shown in Figure 12.12. 


3 Using data the latter part of which is listed as Series M. 
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FIGURE 12.12 Forecast of sales at origin t = 89 with P and Q weights for lead 5 forecast. 


TABLE 12.9 Vj and i// ( Weights for Nonstationary Model for Sales Data 


i 

V J 

% 

j 

v i 

¥0 

0 

0 

1 

6 

9.14 


1 

0 

0.46 

7 

9.86 

0.46 

2 

0 

0.46 

8 

10.37 

0.46 

3 

4.82 

0.46 

9 

10.75 

0.46 

4 

6.75 

0.46 

10 

11.02 

0.46 

5 

8.14 

0.46 

11 

11.21 



12.6 SOME ASPECTS OF THE DESIGN OF EXPERIMENTS TO ESTIMATE 
TRANSFER FUNCTIONS 

In some engineering applications, the form of the input X t can be deliberately chosen so 
as to obtain good estimates of the parameters in the transfer function-noise model: 


Y t = 8- 1 (B)co(B)X,_ b + N t 

The estimation of the transfer function is equivalent to estimation of a dynamic ‘ ‘regres¬ 
sion’ ’ model, and the methods that can be used are very similar to those used in ordinary 
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TABLE 12.10 itP , and Q Weights for Nonstationary Model for Sales Data 


j 

K i 

P J 

Qj 

i 

K i 

Pj 

Qj 

1 

0.68 

0.46 

0 

9 

0.00 

0.00 

-0.74 

2 

0.22 

0.25 

0 

10 

0.00 

0.00 

-0.59 

3 

0.07 

0.13 

4.82 

11 

0.00 

0.00 

-0.29 

4 

0.02 

0.07 

1.25 

12 

0.00 

0.00 

-0.13 

5 

0.01 

0.04 

-0.29 

13 

0.00 

0.00 

-0.06 

6 

0.00 

0.02 

-0.86 

14 

0.00 

0.00 

-0.02 

7 

0.00 

0.01 

-0.97 

15 

0.00 

0.00 

0.00 

8 

0.00 

0.01 

-0.89 






nondynamic regression. As might be expected, the same problems (see e.g. Box, 1966) 
face us. 

As with static regression, it is very important to be clear on the objective of the investiga¬ 
tion. In some situations, we want to answer the question: If the input X is merely observed 
(but not interfered with), what can this tell us of the present and future behavior of the 
output Y under normal conditions of process operation? In other situations, the appropriate 
question is: If the input X is changed in some specific way, what change will be induced 
in the present and future behavior of the output Y? The types of data we need to answer 
these two questions are different. 

To answer the first question unambiguously, we must use data obtained by observing, 
but not interfering with , the normal operation of the system. In contrast, the second question 
can only be answered unambiguously from data in which deliberate changes have been 
induced into the input of the system; that is, the data must be specially generated by a 
designed experiment. 

Clearly, if X is to be used as a control variable, that is, a variable that may be used 
to manipulate the output, we need to answer the second question. To understand how we 
can design experiments to obtain valid estimates of the parameters of a cause-and-effect 
relationship, it is necessary to examine the assumptions of the analysis. 

A critical assumption is that the X/s are distributed independently of the N/ s. When 
this assumption is violated, the following issues arise: 

1. The estimates we obtain are, in general, not even consistent. Specifically, as the 
sample size is made large, the estimates converge not on the true values but on other 
values differing from the true values by an unknown amount. 

2. The violation of this assumption is not detectable by examining the data. Therefore, 
the possibility that in any particular situation the independence assumption may not 
be true is a particularly disturbing one. The only way it is possible to guarantee its 
truth is by deliberately designing the experiment rather than using data that have 
simply “happened.” Specifically, we must deliberately generate and feed into the 
process an input X t , which we know to be uncorrelated with N t because we have 
generated it by some external random process. 

The input X t can, of course, be autocorrelated; it is necessary only that it should not be 
cross-correlated with N t . To satisfy this requirement, we could, for example, draw a set of 
random variates a t and use them to generate any desired input process X t = y/ x (B)a t . 
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Alternatively, we can choose a fixed “design,” for example, the factorial design used 
in Section 12.4.2, and randomize the order in which the runs are made. Appendix A12.2 
contains a preliminary discussion of some elementary design problems, and it is sufficient to 
expose some of the difficulties in the practical selection of the “optimal” stochastic input. 
In particular, as is true in a wider context: (1) it is difficult to decide what is a sensible 
criterion for optimality, and (2) the choice of ‘ ‘optimal’ ’ input depends on the values of the 
unknown parameters that are to be optimally estimated. In general, a white noise input has 
distinct advantages in simplifying identification, and if nothing very definite were known 
about the system under study, it would provide a sensible initial choice of input. 


APPENDIX A12.1 USE OF CROSS-SPECTRAL ANALYSIS FOR TRANSFER 
FUNCTION MODEL IDENTIFICATION 

In this appendix, we show that an alternative method for identifying transfer function 
models, which does not require prewhitening of the input, can be based on spectral analysis. 
Furthermore, it is easily generalized to multiple inputs. 

A12.1.1 Identification of Single-Input Transfer Function Models 

Suppose that the transfer function v(B) is defined so as to allow the possibility of nonzero 
impulse response weights Vj for j a negative integer, so that 

00 

v(B)= £ v k B k 

k——oo 

Then if, corresponding to (12.2.3), the transfer function-noise model is 

y, = v(B)x, + n, 


equations (12.2.5) become 


Y xy (k) = X v jYxx(k ~ j) k = 0, ±1, ±2,... (A12.1.1) 

j=- oo 

We now define a cross-covariance generating function 

oo 

r xy (B)= Yj r xy (k)B k (A12.1.2) 

k=—oo 

which is analogous to the autocovariance generating function (3.1.10). On multiplying 
throughout in (A12.1.1) by B 1 and summing, we obtain 


r xy (B) = v(B)y xx (B) 


(A12.1.3) 


If we now substitute B = e ,2k P in(A12.1.2), we obtain the cross-spectrum p x;) ,(/) between 
input x t and output y t . Making the same substitution in (A12.1.3) yields 


v(e- ilKf ) 


Pxy(f) 
Pxx(f ) 


--</<- 

2 2 


(A12.1.4) 
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where 


00 

v(e~ ,2nf ) = G(f)e ll7!4,if) = Yj v k e- ilKfk (A12.1.5) 

k——oo 


is called the, frequency response function of the system transfer function relationship and 
is the Fourier transform of the impulse response function. Since v(e~ ,2n ^) is complex 
valued, we write it as a product involving a gain function G(f) =| v(e l2K f) | and a phase 
function </>(/). Equation (A12.1.4) shows that the frequency response function is the ratio of 
the cross-spectrum to the input spectrum. Methods for estimating the frequency response 
function v(e~ l2n f) are described by Jenkins and Watts (1968). Knowing u(e~ l2n f), the 
impulse response function v k can then be obtained from 

r 1/2 

v k = v{e-' lnf y lKfk df (A 12.1.6) 

7-1/2 

Using a similar approach, the autocovariance generating function of the noise n t is 


An) = ^)- rWW 

y xx (B) 

On substituting B = e~ l2lr f in (A 12.1.7), we obtain the expression 

Pnn(f) = Pyyim ~ ^/f)] 
for the spectrum of the noise process, where 


(A12.1.7) 


(A12.1.8) 


Ky(f) = 


I Pxy(f) I 2 
Pxx(f)Pyy(f) 


and k xy (f ) is the coherency spectrum between the series x t and y t . The coherency spectrum 
k xy (f) at each frequency / behaves like a correlation coefficient between the random 
components at frequency / in the spectral representations of x t and y t . Knowing the noise 
spectrum, the noise autocovariance function y lm (k) may then be obtained from 


r 1/2 

y nn (k) = 2 / p nn (f)cos(27tfk)df 
Jo 

By substituting estimates of the spectra such as those described in Jenkins and Watts 
(1968), estimates of the impulse response weights v k and noise autocorrelation function 
are obtained. These can be used to identify the transfer function model and noise model as 
described in Sections 12.2.1 and 6.2.1. 


A12.1.2 Identification of Multiple-Input Transfer Function Models 

We now generalize the model 


Y t = v(B)X t + N, 

= 8-\B)co(B)X t _ b + N t 
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to allow for several inputs X\ t , X 2t ,, X mt . Thus, 

Y t = 0l ( B)X U + - + v m (B)X mt + N, (A 12.1.9) 

= 8-\B)co x {B)X u _ bi + ... + S~\B)co m (B)X mJ _ bm + N , (A12.1.10) 

where Vj(B) = Sj l (B)cOj(B)B b j is the generating function of the impulse response weights 
relating Xj t to the output Y t . We assume, as before, that after differencing, (A12.1.9) may 
be written as 


y, = v l (B)x u + ••• + v m (B)x mt + n t 

where t , ..., x mt , and n t are all jointly stationary processes. Multiplying throughout by 
X] ,_ k , x 2 t -k ,..., x m t _ k in turn, taking expectations, and forming the generating functions, 
we obtain 

r x ' y (B) = Vi (B)y x ' x HB) + v 2 (B)y x ^(B) + ... + v m (B)Y x ' x "(B) 

Y X2y (B) = Vi (B)y X 2 Xi (B) + v 2 (B)y X 2 X 2 (B) + - + v m (B) r X2Xm (B) 

; : : : (A 12 . 1 . 11 ) 

r x "' y (B) = v x (B)y v '(B ) + v 2 (B)y V 2 (B) + - + v m (B)Y v "‘(B) 

On substituting B = e~ l2,c ^, the spectral equations are obtained. For example, with m = 2, 

Px iy (f) = H x (f)p XiXx (f) + H 2 (f)p XiX 2 (f) 

Px 2 y(f ) = H x (f)p X 2 Xi (f) + H 2 if) Px 2 X 2 (J) 

and the frequency response functions H x (f ) = v x {e~ l2n ^) and H 2 (f) = u 2 {e~ l2n f) can be 
calculated as described in Jenkins and Watts (1968). The impulse response weights can 
then be obtained using the inverse transformation (A12.1.6). 


APPENDIX A12.2 CHOICE OF INPUT TO PROVIDE OPTIMAL 
PARAMETER ESTIMATES 

Suppose that the input to a dynamic system can be made to follow an imposed stochastic 
process that is our choice. For example, it might be an autoregressive process, a moving 
average process, or white noise. To illustrate the problems involved in the optimal selection 
of this stochastic process, it is sufficient to consider an elementary example. 

A12.2.1 Design of Optimal Inputs for a Simple System 

Suppose that a system is under study for which the transfer function-noise model is assumed 
to be 


Y t = p l Y t _ l +p 2 X t _ l +a t IAI < 1 (A12.2.1) 

where a t is white noise. It is also assumed that the input and output processes are stationary 
and that X t and Y t denote deviations of these processes from their respective means. For 
large samples, and associated with any fixed probability, the approximate area of the 
Bayesian HPD region for /), and p 2 , and also of the corresponding confidence region, is 
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proportional to A */ 2 , where A is the determinant 


E[Y~] E[Y,X t ] 
E[Y t X ,] E[Xf] 


We will proceed by attempting to find the design minimizing the area of the HPD or 
confidence region and thus maximizing A. Now 


where 


2 a 2 


E[Y t ] — Gy — a X^2 


1+2 q 


1 -fif l-fit 


+ 


i 02 

E[Y t X t \ = cYj-q 
E[Xf] = a 2 


(A 12.2.2) 


| ]p[pi aj c p, = E[X t X t _ t \ 


i= 1 


The value of the determinant may be written in terms of g z x as 


A a Wa t 44 


4^2 


i~4 a~4) 2 p\ 


q - 


P 2 


i-4. 


(A12.2.3) 


Thus, as might be expected, the area of the region can be made small by making a 2 x large 
(i.e., by varying the input variable over a wide range). In practice, there may be limits to 
the amount of variation that can be allowed in X. Let us proceed by first supposing that a 2 x 
is held fixed at some specified value. 


Solution with <r 2 Fixed. With (1 — p 2 ) > 0 and for any fixed o 2 x , we see from (A12.2.3) 
that A is maximized by setting 


that is, 


PlPl + P\P2 + P\P3 + "■ ~ P\ + P\ + P\ + •" 


There are an infinite number of ways in which, for given p ,, this equality could be achieved. 
One obvious solution is 
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Thus, one way to maximize A for fixed 0 “ would be to force the input to follow the 
autoregressive process 


(1 - /?, B)X t = a, 

where a t is a white noise process with variance a^ = a 2 x (l — p 2 ). 

Solution with a 2 Fixed. So far we have supposed that a 2 is unrestricted. In some cases, 
we might wish to avoid too great a variation in the output rather than in the input. Suppose 
that (7y is held equal to some fixed acceptable value but that rr 2 , is unrestricted. Then the 
value of the determinant A can be written in terms of er 2 as 




where 


and 


The maximum is achieved by setting 


P\ l 

' q + s \ 2 

a 2 s 2 \ 

,1+2 q) 



1 + p\r 




a Y - G l 



~P\r 

1 + P 2 r 


(A12.2.4) 


(A12.2.5) 


(A12.2.6) 


(A 12.2.7) 


that is. 


PiPi + P\Pl + P\p3 + 


P{r + fir 2 - py +- 


6,.3 


There are again infinite ways of satisfying this equality. In particular, one solution is 

Pi = (~P\ r )‘ (A12.2.8) 

which can be obtained by forcing the input to follow the autoregressive process 

(1 + p l rB)X, = a, (A 12.2.9) 

where a t is a white noise process with variance o 2 = 0^,(1 — p 2 r 2 ). Since r is essentially 
positive, the sign of the parameter (—p^r) of this autoregressive process is opposite to that 
obtained for the optimal input with o 2 x fixed. 


Solution with cr 2 X a 2 x Fixed. In practice, it might happen that excessive variations in 
input and output were both to be avoided. If it were true that a given percentage decrease in 
the variance of X was equally as desirable as the same percentage decrease in the variance 
of Y , it would be sensible to maximize A subject to a fixed value of the product o 2 x X a 2 . 
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The determinant is 


A 2 2 

A = <j x g y 


fii 


(A 12.2.10) 


which is maximized for fixed only if q = 0. Once again there are an infinite number 
of solutions. However, by using a white noise input, A is maximized whatever the value of 
f l . For such an input, using (A12.2.2), a 2 x is the positive root of 

a 4 x fl + o 2 x ol - k(\ - /?,) = 0 (A12.2.11) 

where k = er^er 2 , which is fixed. 


A12.2.2 Numerical Example 

Suppose that we were studying the first-order dynamic system (A12.2.1) with = 0.50 
and If = 1.00, so that 

Y t = 0.50Y,_, + 1.00A f _! + a, 

where a 2 = 0.2. 

Fixed, tr 2 Unrestricted. Suppose at first that the design is chosen to maximize A with 
tr 2 = 1.0. Then one optimal choice for the input X t will be the autoregressive process 

(1 - 0.5 B)X, = a, 

where the white noise process a t would have variance a 2 = c- 2 ,(l — f 2 ) = 0.75. Using 
(A12.2.2), the variance a 2 of the output would be 2.49, and the scheme will achieve a 
Bayesian region for /3j and (f whose area is proportional to A -1 / 2 = 0.70. 

rr 2 Fixed, a 2 x Unrestricted. The above scheme is optimal under the assumption that the 
input variance is a 2 x = 1 and the output variance is unrestricted. This output variance then 
turns out to be a 2 . = 2.49. If, instead, the input variance were unrestricted, then with a 
fixed output variance of 2.49, we could, of course, do considerably better. In fact, using 
(A12.2.6), r = 1.087 and hence r ~ 0.54, so that from (A12.2.9) one optimal choice for 
the unrestricted input would be the autoregressive process 

(1 + 0.54 B)X, = a, 

where in this case a t is a white noise process with ct 2 = c 2 (1 — f 2 r 2 ). Using (A12.2.2) 
with o 2 = 2.49 fixed and q = —0.214 from (A12.2.7), the variance o 2 x of the input would 
now be increased to 2.91, so that a 2 = 2.05, and A -1 / 2 , which measures the area of the 
Bayesian region, would be reduced to A -1 / 2 = 0.42. 

Product ct 2 X (T 2 ^ Fixed. Finally, we consider a scheme that attempts to control both ct 2 
and (7 2 by maximizing A with ct 2 X c 2 fixed. In the previous example in which ct 2 was 
fixed, we found that A -1 / 2 = 0.42 with tr 2 = 2.91 and o 2 = 2.49, so that the product is 
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2.91 X 2.49 = 7.25. If our objective had been to minimize A -1 / 2 while keeping this product 
equal to 7.25, we could have made an optimal choice without knowledge of Pi by choosing a 
white noise input X, = a,. Using (A12.2.11), a\ = tr 2 = 2.29, o\ = 3.16, and in this case, 
as expected. A -1 / 2 = 0.37, slightly smaller than that in the previous example. 

It is worth considering this example in terms of spectral ideas. To optimize with c 2 
fixed, we have used an autoregressive input with (f> x positive that has high power at 
low frequencies. Since the gain of the system is high at low frequencies, this achieves 
maximum transfer from X to Y and so induces large variations in Y. When er 2 is fixed, 
we have introduced an input that is an autoregressive process with <p x negative. This has 
high power at high frequencies. Since there is minimum transfer from X to Y at high 
frequencies, the disturbance in X must now be made large at these frequencies. When the 
product (7 2 X Oy is fixed, the “compromise” input white noise is indicated and does not 
require knowledge of /?j. This final maximization of A is equivalent to minimizing the 
(magnitude of the) correlation between the estimates p j and ff, and in fact the correlation 
between these estimates is zero when a white noise input is used. 

Conclusions. This investigation shows the following: 

1. The optimal choice of design rests heavily on how we define “optimal.” 

2. Both in the case where a 2 x is held fixed and in the case where a 2 is held fixed, 
the optimal choices require specific stochastic processes for the input X, whose 
parameters are functions of the unknown dynamic parameters. Thus, we are in the 
familiar paradoxical situation where we can do a better job of data gathering only to 
the extent that we already know something about the answer we seek. A sequential 
approach, where we improve the design as we find out more about the parameters, 
is a possibility worth further investigation. In particular, a pilot investigation using 
a possibly nonoptimal input, say white noise, could be used to generate data from 
which preliminary estimates of the dynamic parameters could be obtained. These 
estimates could then be used to specify a further input using one of our previous 
criteria. 

3. The use of white noise is shown .for the simple case investigated, to be optimal for a 
sensible criterion of optimality, and its use as an input requires no prior knowledge 
of the parameters. 


EXERCISES 

12.1. Estimate of the cross-correlation function at lags —1,0, and +1 for the following 
series of five pairs of observations: 


t 

i 

2 

3 

4 

5 

x , 

it 

7 

8 

12 

14 

y, 

7 

10 

6 

7 

10 


12.2. If two series may be represented in (//-weight form as 


y, = V y (B)a, 


x i = ¥ x ( B ^ a t 
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(a) Show that their cross-covariance generating function 

00 

r xy (B)= £ r xy (k)B k 

k =—oo 


is given by G 2 a v y (B)v x (F). 

(b) Use the above result to obtain the cross-covariance function between y, and x, 
when 


y t = {\-OB)a t x, = (l - 0\B - 6^B 2 )a t 

12.3. After estimating a prewhitening transformation 6~ l (B)cf> x (B)x t = a t for an input 
series x t and then computing the transformed output [), = 9~ 1 (B)(f> x (B)y t , cross¬ 
correlations r a /)(k) were obtained as follows: 


k 

r al>(k) 

k 

r afi(k) 

0 

0.05 

5 

0.24 

1 

0.31 

6 

0.07 

2 

0.52 

7 

-0.03 

3 

0.43 

8 

0.10 

4 

0.29 

9 

0.07 


with a a = 1.26, ftp = 2.73, and n = 187. 

(a) Obtain approximate standard errors for the cross-correlations. 

(b) Calculate rough estimates for the impulse response weights Vj of a transfer 
function between y t and x t . 

(c) Suggest a model form for the transfer function and give rough estimates of its 
parameters. 

12.4. It is frequently the case that the user of an estimated transfer function-noise model 
y t = 8~ l (B)m(B)B b x t + n t will want to establish whether the steady-state gain 
g = <5 _1 (1)®(1) makes good sense. 

(a) For the first-order transfer function system 

®o 


show that an approximate standard error <r(g) of the estimate g = w 0 /( 1 — <5) is 
given by 


° 2 (g) _ v ar[<a> 0 ] var[<5] 2cov[® 0 , <5] 

8 2 ~ ®o (' -^) 2 ®od-5) 

(b) Calculate g and an approximate value for <r(g) when ® 0 = 5.2 ,§ = 
0.65, S-(ffl 0 ) = 0.5, g(S) = 0.1, and cov[® 0 ,5] = 0.025. 
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12.5. Consider the regression model 

Y t = p,X u + P 2 X 2 ,t + N t 

where N, is a nonstationary error term following an IMA(0, 1,1) process V N t = 
a, — 9a t _ 1 . Show that the regression model may be rewritten in the form 

Y t ~ Y t _ x = Pi(X i t - X u _ x ) + p 2 {X 2 , - X 24 _ x ) + a, 

where Y t _ x , X x t _ x , and X 2 ,_i are exponentially weighted moving averages so that, 
for example, 

Y t -i = (1 - Wt -1 + 0Y t _ 2 + 9 2 Y t _ 3 + ■■•) 

It will be seen that the fitting of this regression model with nonstationary noise 
by maximum likelihood is equivalent to fitting the deviations of the independent 
and dependent variables from local updated exponentially weighted moving aver¬ 
ages by ordinary least-squares. (Refer to Section 9.5.1 for related ideas regarding 
transformation of regression models with autocorrelated noise N t .) 

12.6. Quarterly measurements of unemployment and the gross domestic product (GDP) 
in the United Kingdom over the period 1955-1969 are included in Series P in Part 
Five of this book; see also http://pages.stat.wisc.edu/ reinsel/bjr-data/. 

(a) Plot the two time series using R. 

(b) Calculate and plot the autocorrelation and partial autocorrelation functions 
of the two series. Repeat the calculations for the first differences of the 
two series. Would a variance stabilizing transformation be helpful for model 
development? 

(c) Calculate and plot the cross-correlation function between the two series. 

12.7. Refer to Exercise 12.6. Build (identify, estimate, and check) a transfer 
function-noise model that uses the GDP series X t as input to help explain variations 
in the logged unemployment series Y t . 

12.8. Consider the transfer function-noise model fitted to the gas furnace data in (12.4.1) 
and (12.4.2). Note that the estimate of S 2 is very close to zero. Re-estimate the 
parameters of this model setting S 2 equal to zero. Describe the resulting impact on 
the estimate of the residual variance and other model parameters. 

12.9. A bivariate time series consisting of sales data and a leading indicator is listed as 
Series M in Part Five of this book. The series is also available as “BJsales” in the 
datasets package of R. 

(a) Plot the two time series using R. 

(b) Calculate and plot the autocorrelation and partial autocorrelation functions of 
the two series. Find a suitable model for the leading indicator series. 

(c) Calculate and plot the cross-correlation function between the two variables. 

(d) Calculate and plot the cross-correlation function after prewhitening the series 
using the time series model developed in part (b). 

(e) Estimate the impulse response function v k for the two series. 
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12.10. Refer to Exercise 12.9. A bivariate transfer function-noise model was given for 
these series in Section 12.5.3. 

(a) Use the results from Exercise 12.9 to justify the choice of transfer function 
model. Derive preliminary estimates of the parameters in this model. 

(b) Justify the choice of the noise model given in Section 12.5.3. 

(c) Estimate the parameters of the combined transfer function-noise model and 
perform the appropriate diagnostic checks on the fitted model. 
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INTERVENTION ANALYSIS, OUTLIER 
DETECTION, AND MISSING VALUES 


Time series are often affected by special events or conditions such as policy changes, strikes, 
advertising promotions, environmental regulations, and similar events, which we will refer 
to as intervention events. In Section 13.1, we describe the method of intervention analysis, 
which can account for the expected effects of these interventions. For this, the transfer 
function models of the previous chapters are used, but in the intervention analysis model, 
the input series will be in the form of a simple pulse or step indicator function to signal 
the presence or absence of the event. The timing of the intervention event is assumed to 
be known in this analysis. Section 13.2 considers the related problem of detecting outlying 
or unusual behavior in a time series at an unknown point of time. Depending on how the 
outlier enters and its likely impact on the time series, two types of outlier models, additive 
outlier (AO) and innovational outlier (10) models, are considered. A somewhat related 
problem of missing values in a time series is discussed in Section 13.3. The key focus 
of this section is on parameter estimation and evaluation of the likelihood function of an 
ARMA model for time series with missing values. However, consideration is also given to 
estimation of the missing values in the series. 


13.1 INTERVENTION ANALYSIS METHODS 

13.1.1 Models for Intervention Analysis 

In the setting of intervention analysis, it is assumed that an intervention event has occurred 
at a known point in time T of a time series. It is of interest to determine whether there is 
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any evidence of a change or effect, of an expected kind, on the time series Y t associated 
with the event. We consider the use of transfer function models to model the nature of 
and estimate the magnitude of the effects of the intervention, and hence to account for the 
possible unusual behavior in the time series related to the event. Based on the study by Box 
and Tiao (1975), the type of model we consider has the form 


y,= 


w(B)B b 

8(B) 


+ N t 


(13.1.1) 


where the term y t = 8~ l (B)co(B)B b ^ t represents the effects of the intervention event in 
terms of the deterministic input series £ f , and N t is the noise series that represents the 
underlying time series without the intervention effects. It is assumed that N t follows 
an ARIMA(p, d,q) model, cp(B)N, = 0(B)a t , with cp(B) = (p(B)(\ - B) d . Multiplicative 
seasonal ARIMA models as presented in Chapter 9 can also be included for N t , but special 
note of the seasonal models will not be made in this chapter. 

There are two common types of deterministic input variables that have been found 
useful to represent the impact of intervention events on a time series. Both of these are indi¬ 
cator variables taking only the values 0 and 1 to denote the nonoccurrence and occurrence 
of the intervention. One type is a step function at time T, given by 


s r (T) 


CO t <T 
\l t > T 


(13.1.2) 


which would typically be used to represent the effects of an intervention that are expected 
to remain permanently after time T to some extent. The other type is a pulse function at /’, 
given by 



0 t f T 
1 t = T 


(13.1.3) 


which could represent the effects of an intervention that are temporary or transient and will 
die out after time T. These indicator input variables are used in many situations where the 
effects of the intervention cannot be represented as the response to a quantitative variable 
because such a quantitative variable does not exist or it is impractical or impossible to 
obtain measurements on such a variable. 

Because of the deterministic nature of the indicator input series f in (13.1.1), unlike 
the transfer function model situation of Chapter 12, identification of the structure of the 
intervention model operator v(B) = <5~ 1 (B )a>( B ) B b cannot be based on the technique of 
prewhitening. Instead, it is necessary to postulate the form of the intervention model by 
considering the mechanisms that might cause the change or effect and the implied form 
of the change that would be expected. In addition, the identification may be aided by 
direct inspection of the data to suggest the form of effect due to the known event, and 
supplementary evidence may sometimes be available from examination of the residuals 
from a model fitted before the intervention term is introduced. 
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FIGURE 13.1 Responses to a step and a pulse input: (a-c) Response to a step input for various 
simple transfer function models, and (d-f) Response to a pulse input for some common models of 
interest. 


Response Patterns Useful in Intervention Analysis. Several different response patterns 

y, = 8 ~\B)co(B)B% 

are possible through different choices of the transfer function. Figure 13.1 shows the 
responses for various simple transfer functions with both step and pulse indicators as 
input. For example, the model y t = coBS r in Figure 13.1(a) can be used to represent a 
permanent step change in level of unknown magnitude a after time T , while the form 

= coB (D 0 < 5 < 1 (13.1.4) 

' 1 - SB r 

in Figure 13.1(b), which implies that y, = co( 1 — S r ~ T )/(l — 8), t > T, corresponds to a 
gradual change with rate 8 that eventually approaches the long-run change in level equal 
to co/(l —8). Similarly, the model 

CO,B j y, 

y, = ~r^sB p ' 0<8<l ( 13 - L5 ) 

in Figure 13.1(d), which implies that y t = co l 8 , ~ T ~ l ,t > T, would represent a sudden 
“pulse” change after time T of unknown magnitude aq, followed by a gradual decay of 
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rate 8 back to the original preintervention level with no permanent effect. More complex 

response patterns can be obtained by various linear combinations of the simpler forms, 

(T) (T) 

such as in the case of Figure 13.1(f). It is also noted that since (1 — B)S f = P t , any 
of the transfer function models that involve S ( t T} could equally well be represented in 
terms P < l ' ! . 

The following additional points concerning the intervention models are worthy of note. 
The function y t represents the additional effect of the intervention event over the noise or 
“background” series N t . Hence, when possible, the model N t = [9(B)/cp(B)]a t for the 
noise is identified based on the usual procedures applied to the time series observations 
available before the date of the intervention, that is, Y t ,t<T. Also, it is assumed in model 
(13.1.1) that only the level of the series is affected by the intervention and, in particular, 
that the form and the parameters of the time series model for N t are the same before and 
after the intervention. One should also recognize that there can be considerable differences 
in the accuracy with which the intervention model parameters can be estimated depending 
on whether the noise N t is stationary or nonstationary, as well as on whether permanent or 
transitory effects are postulated. 

In general, the parameter estimates and their standard errors for the intervention model 


= co(B)B b e 0(B) 
8(B) (t cp(B) a> 


(13.1.6) 


are obtained by the least-squares method of estimation for transfer function-noise models, 
as described in Section 12.3. Diagnostic checking based on the residuals a t from the fitted 
model can also be performed using methods similar to those previously employed to assess 
the adequacy of a fitted model. 


13.1.2 Example of Intervention Analysis 

Box and Tiao (1975) considered the monthly time series consisting of the rate of change 
in the U.S. consumer price index (CPI) for the period July 1953 through December 1972. 
Beginning in September 1971, phase I economic control went into effect for 3 months, and 
after that phase II was in effect. The problem was to investigate the possible effect of the 
phase I and II controls on the rate of change in the CPI. 

Inspection of the sample autocorrelation functions of the rate of change of the CPI and 
its first differences for the 218 monthly observations prior to phase I suggested a noise 
model of the form 


(\-B)N t = (\-6B)a, (13.1.7) 

with maximum likelihood estimates 6 = 0.84 and 8 a = 0.0019. Examination of the residuals 
and their autocorrelations reveals no obvious inadequacies in this model. 

Then, to address the question of the possible effects of phase I and II controls, it is 
assumed that phase I and II are expected to produce changes in the level of the rate of 
change of the CPI, and that the form of the noise model remains the same. Based on these 
assumptions, the appropriate model to assess the impact of the controls is 

Y t = t + ® 2 ^ 2 1 + 


(13.1.8) 
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where 


( 1 

t = September, October, or November 1971 

= < 


^0 

otherwise 

f l 

t > December 1971 

4= < 


1° 

otherwise 


The nonlinear least-squares estimates of the parameters in model (13.1.8) were obtained, 
with standard errors in parentheses, as 

6 = 0.85(0.05) wj = -0.0022(0.0010) w 2 = -0.0008(0.0009) 


Hence, the analysis suggests that a drop in the rate of increase of the CPI is associated with 
phase I, but the effect of phase II is much less certain. 

Many other examples of the use of intervention analysis have appeared in the literature. 
These include studies of the effects of regulations for engine design changes in new cars 
on oxidant pollution levels in the Los Angeles area (Box and Tiao, 1975), the effect of 
a change in policy in relation to debt collection on bad debt collections (Jenkins, 1979), 
the effectiveness of seat belt legislation on road deaths (Bhattacharyya and Layton, 1979), 
and the impact of the Arab oil embargo on electricity consumption in the United States 
(Montgomery and Weatherby, 1980). 


13.1.3 Nature of the ML Estimator for a Simple Level Change Model 

It is instructive to consider the nature of the maximum likelihood estimator of the inter¬ 
vention parameters, such as those in (13.1.8), for some relatively simple situations. We 
consider the simple model 


Y, = ©ft + N, 

where N t = cf>~ l (B)d(B)a t . This model can be written, formally, as 


(13.1.9) 


n(B)Y, = cox(B)Z t + a, (13.1.10) 

where n(B ) = 0 _1 (B)<p(B) = 1 — 71 i ■ Letting w t = n(B)Y t and x t = n(B)t; t , we 

can write (13.1.10) in the form of a simple linear model w t = wx t + a t , t = 1,2,... ,n. 
Hence, the maximum likelihood estimator of m is approximately 


co = 


L,= i x t w t 

TU 


(13.1.11) 


with var[d>] = o 2 a / £" =| x]. 

Example with a Step Change Input and Nonstationary Noise. Let us consider a spe- 

(T) 

cial case of (13.1.9) where c r = BS f represents a step change after time T. Then, 
x, = 7t(B)BSl T) = 1 — Jtj, t > T + 1, with x r+1 = 1 and x t = 0 for t <T. For 

the discussion that follows, we suppose that n is large, and that a relatively large number 
of observations are available before and after the intervention time T. 
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Now suppose that the noise N t in (13.1.9) is nonstationary with generalized autoregres¬ 
sive operator cp(B) = <p(B)( 1 — B) so that 

JI{B) = e~\B)4>(B)( 1 - B) = jt{B){ 1 - B) 

with jt(B) = e- l (B)4>(B) = 1 - jtjBJ. Then,*, = 1 - £)Sf’ ) = X(B)P^l = 

7T t - T -\, t >T + 1, and hence 

n n oo 

E = 2 - E ^ 

i=i r=r+i i=o 

Also, w, = n(B)Y t = Y t — Y t _ l , where Y r-1 = 1 a weighted average of values 

prior to t (since n i = 1 when d = 1). Following the results in Box and Tiao (1975), it 
can then be shown that 

n n 

Y x t w t = Yj *t-T-i w t - n(B)n(F)( 1 - B)Y T+l 

t= 1 t=T +1 

OO 00 

= 2 “i-^T+l+j _ 2 a s Y T-s 

s=0 5=0 


where a s = i/ 5 — t] s+l and the rj s are coefficients in jt{B)ft{F) = i/ 0 + j »y s (B' s + T 7,5 ), 
such that a 5 = do = ^, 2 ■ Therefore, in this situation, the maximum likelihood 

estimator of a> is 

I"-! X f W f “ \ 

® = 'T, V - 2 a 5^r+ 1 +5 - 2 a ^r-5 (13.1.12) 

ljt=\ X t \ 5=0 5=0 / 

with var[ib] ~ ir^(/y 0 ) _1 . The estimator w can thus be interpreted as a contrast between 
two weighted moving averages, one consisting of the observations after the intervention 
and the other for the observations before the intervention, where the weights (a s /rj 0 ) are 
symmetrical. 

For example, consider the case where N t follows the IMA(0,1,1) model, (1 — B)N t = 
(1 - 9B)a„ so that n(B) = (1 - 6BY 1 with ft = 9‘, i > 1. Then, t] s = 9 s /(I - 9 2 ), s = 
0,1,..., and so a s = (6 s — 9 s+1 )/( 1 — 9 2 ) = 9 s /(1 + 9). Hence, the estimator in (13.1.12) 
becomes 


<&*(!- ^ Y ° SY t+i+s - 2 0SY t -5 (13.1.12a) 

\s=0 5=0 / 

with varfih] ~ cr^( 1 — 9 2 ). The estimator w is thus a contrast between two exponentially 
weighted moving averages, one consisting of the observations after the intervention and 
the other for the observations before the intervention. 

Now, as a second case, suppose that the noise instead follows the ARIMA(1,1,0) 
model, so that H(B) = (1 — </>.B) with;?, = —</>and H i = Ofor/ > l.Then/ 7 0 = 1 + <p 2 ,rj\ = 
—ip, and t] s = 0, s > 1. Hence, X"=i Y = 1 + = do’ a o = 1 + $ + 4 > 2 > (x \ = — </>, and it 
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follows that 

n 

Y x < w t = a - mi - mi - b)y t+ i 
1=1 

= [(! + </> + (j> 2 )Y T+ \ — 4>Y T+2 ] — [(! + </> + 4> 2 )Y t — c/>Y T _ l ] 


Thus, for this case we have 

„ L t = i x t w t 

YU x2 , 

= (1 + 0 2 )- 1 {[(1 + 0+ (jr)Y T+l - fY T+2 ] 

-[(1 + </> + </> 2 )T r -0F r _ 1 ]} (13.1.13) 

with var[w)] = t7 2 /(l + <fi 2 ). Again, the estimator w can be viewed as a contrast between 
two weighted averages of the same form, one of the postintervention observations y t+ l 
and Y t+2 and the other of the preintervention observations Y r and Y T _ l , but the weighted 
averages are only finite in extent because the noise model contains only an AR factor 
(1 — (j)B ) and no MA factor as in the previous case. 


Comparison with a Case with Stationary Noise. Finally, we consider a simpler situation of 

model(13.1.9), in which the noise is stabo/iury, for example, an AR(l)model(l — <j) B ) N t = 

(T) 

a t . In this situation we obtain x t = (1 — <pB)BS ' = \ — <p for 1 > 7’ + 1 with x r+1 = 1 
and w, = (1 — </)B ) Y t = Y t — <pY t _ ] . Then, it readily follows that 


a) = 


L,= i x t w t 
Yj"=\ xl t 


a-0)i" =r+ i(y f -^-i) _ - 
in - / hi - ci » 2 ~ 1 - 


(13.1.14) 


where Y 2 = (« — T) 1 X" =r+1 Y t denotes an unweighted average of all observations after 
the intervention, with var[®] = ff 2 /[l + (n — T — 1)(1 - </>) 2 ] ~ a 2 /[(« - T)(l - </>) 2 ]. 
Notice that because of the stationarity of the noise, we have an unweighted average of 
postintervention observations and also that there is no adjustment for the preintervention 
observations because they are assumed to be stationary about a known mean of zero. Also 
note that in the stationary case, the variance of & decreases proportionally with 1 /(« — T), 
whereas in the previous nonstationary noise situations, var[cw] is essentially a constant not 
dependent on the sample size. This reflects the differing degrees of accuracy in the esti¬ 
mators of intervention model parameters, such as the level shift parameter m, that can be 
expected in large samples between the nonstationary noise and the stationary noise model 
situations. 

Specifically, in the model (13.1.9), with = BS t equal to a step input, suppose that 
the noise process N t is nonstationary ARIMA with d = 1, so that ft B)( 1 — B)N t = 0(B)a t . 
Then, by applying the differencing operator (1 — B ). the model 

Y t = coBS f (r) + N, 


( 13 . 1 . 15 ) 
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can also be expressed as 


y t = coBP^ T) + n, (13.1.16) 

where y t = (1 — B)Y t and n t = (1 — B)N t , and hence n, is a stationary ARMA(p, q ) process. 
Therefore, the MLE of co for the original model (13.1.15) with a (permanent) step input 
effect and nonstationary noise (cl = 1) will have features similar to the MLE in the model 
(13.1.16), which has a (transitory) pulse input effect and stationary noise. 

Of course, the model (13.1.9) can be generalized to allow for an unknown nonzero mean 

(T) 

co Q before the intervention, Y t = co 0 + + N t , with c r = BS t , so that co represents the 

change in mean level after the intervention. Then, for the stationary AR(1) noise model 
case, for example, similar to (13.1.14), it can be shown that the MLE of co is & ~ Y 2 — Y lt 
where Y l =T~ X ^ ; / =| Y t denotes the sample mean of all preintervention observations. 

13.2 OUTLIER ANALYSIS FOR TIME SERIES 

Time series observations may sometimes be affected by isolated events, disturbances, 
or errors that create spurious effects in the series and result in unusual patterns in the 
observations that are not consistent with the overall behavior of the time series. Such 
unusual observations may be referred to as outliers. They may be the result of unusual 
external events such as strikes, sudden political or economic changes, unusual weather 
events, sudden changes in a physical system, and so on, or simply due to recording or gross 
errors in measurement. The presence of such outliers in a time series can have substantial 
effects on the behavior of sample autocorrelations, partial autocorrelations, estimates of 
ARMA model parameters, and forecasting, and can even affect the specification of the 
model. If the time of occurrence T of an event that results in the outlying behavior is 
known, the unusual effects can often be accounted for by the use of intervention analysis 
techniques discussed in Section 13.1. However, since in practice the presence of outliers is 
often not known at the start of the analysis, additional procedures for detection of outliers 
and assessment of their possible impacts are important. In this section we discuss some 
useful models for representing outliers and corresponding methods, similar to the methods 
of intervention analysis, for detection of outliers. Some relevant references that deal with 
the topics of outlier detection, influence of outliers, and robust methods of estimation 
include Bruce and Martin (1989), Chang et al. (1988), Chen and Liu (1993), Martin and 
Yohai (1986), and Tsay (1986). 


13.2.1 Models for Additive and Innovational Outliers 

Following the work of Fox (1972), we consider two simple intervention models to represent 
two different types of outliers that might occur in practice. These are the additive outlier 
(AO) and the innovational outlier (10) models. Let z t denote the underlying time series 
process that is free of the impact of outliers, and let Y t denote the observed time series. 
We assume that z t follows the ARIMA(p, d , q) model cp(B)z t = 9(B)a t . Then, an additive 
outlier at time T , or “observational outlier,” is modeled as 


Y, = coPl T) + z t = coPl T) + ^ a t 


( 13 . 2 . 1 ) 
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where P ( ( / | =1 if t = T, P^ T) = 0 if t ■=£ 7’, denotes the pulse indicator at time T. An 
innovational outlier at time T, or “innovational shock,” is modeled as 


v 0(B) (T) , x 0(B) (T) , 

Y, = ——( coP t + a t ) = co —rrr-P, + z t 


cp(B) 


<P(B) 


(13.2.2) 


Hence, an AO affects the level of the observed time series only at time 7’, Y r = co + z T , 
by an unknown additive amount co, while an IO represents an extraordinary random shock 
at time T, a T + co = a*, which affects all succeeding observations Y t , Y t+ ] ,... through 
the dynamics of the system described by i//(B) = 9(B)/cp(B), such that Y r = coi/Zj + z t for 
r = T + i >T. For a stationary series, the effect of the IO is temporary since i//, decay 
exponentially to 0, but for nonstationary series with d > 1, there can be permanent effects 
that approach a level shift or even ramp effect since i//, do not decay to 0. More generally, 
an observed time series Y t might be affected by outliers of different types at several points 
of time 7’j, 7 2 ,... ,T k , and the multiple outlier model of the following general form 


k 

Y r = 'Z D °j V j (B)P ' Tj) + Z < (13 - 2 ' 3) 

7=1 

could be considered for use, where Vj(B) = 1 for an AO at time Tj and Vj(B) = 9(B)/cp(B) 
for an IO at time Tj. Problems of interest associated with these outlier models are to identify 
the timing and the type of outliers and to estimate the magnitude co of the outlier effect, so 
that the analysis of the time series will adjust for these outlier effects. 

Tsay (1988), Chen and Tiao (1990), and Chen and Liu (1993), among others, also 

consider allowance in (13.2.3) for level shift type of outlier effect at unknown time of the 
(T) 

form coS, . The occurrence of such an effect is often encountered in series where the 
underlying process z t that is nonstationary, and such that there is a factor (1 — B) in the 
AR operator cp(B) of the ARIMA model for z t . Then recall that (1 - B)S\ T> = P ( , 7 ) so that 
a level shift type of outlier effect for the nonstationary observed series Y t is equivalent to 
an AO effect for the first differenced series (1 — B)Y r 


13.2.2 Estimation of Outlier Effect for Known Timing of the Outlier 

We first consider the estimation of the impact co of an AO in (13.2.1) and that of an IO in 
(13.2.2), respectively, in the situation where the parameters of the time series model for 
the underlying process z t are assumed known. To motivate iterative procedures that have 
been proposed for the general case, it will also be assumed that the timing T of the outlier 
is given. 

Let 7t(B) = 9~ l (B)cp(B) = 1 — | jTjB' and define e t = Ji(B)Y r for r = 1,2,... ,n, in 

terms of the observed series Y t . Then we can write the above outlier models, (13.2.2) and 

(13.2.1), respectively, as 

IO : e,=coP^ T) + a t (13.2.4a) 

AO : e t = con(B)P { t T) + a, = cox u + a t (13.2.4b) 

where for the AO model, x lr = n(B)P^ T) = — n f if t = T + i > T, x tt = 0 if t < T, with 
7T 0 = —1. Thus, we see from (13.2.4) that the information about an IO is contained solely 
in the “residual” e T at the particular time T, whereas that for an AO is spread over the 
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stretch of residuals ey, ey + i, ey +2 ,..., with generally decreasing weights 1, —K \, — k 2 , . .., 
because the k, are absolutely summable due to the invertibility of the MA operator 0( B ). 
Equivalently, when an AO is present at time T, we can see that the residuals constructed 
from the observed series Y t , for t >T, will be affected as e, = n(B)Y t = a t — c>k 1 for 
r = T + i. Hence, in the presence of an AO, a relatively high proportion of the constructed 
residuals could be influenced and distorted relative to the underlying white noise series a r 
Consequently, the presence of AOs that are unaccounted for typically tend to have a much 
more substantial adverse effect on estimates of the autocorrelations and parameters of the 
ARMA model for z, compared to the presence of innovational outliers. 

From least-squares principles, the least-squares estimator of the outlier impact co in the 
IO model is simply the residual at time T , 


IO: & 1T = e T (13.2.5a) 

with var[w)/ y] = a 2 , while that in the AO model is the linear combination of e T , e r+ [,..., 


AO: 


W A,T 


e T ~ Li 


n-T 

1 n i e T+i 


K*(F)e T 


Vi 


n—T 2 

nr 


(13.2.5b) 


with varffti^ r 1 = (7“/r 2 , where t 2 = 2"_(f and n*(F) = 1 — /zqi 7 — 7t 2 F 2 — ■■■ — 
7i n _jF n ~ T . The notation in (13.2.5) reflects the fact that the estimates depend upon 
the time T. Note that in an underlying autoregressive model cp(B)z, = a t , since then 
jr*(B) = 7r(B) = cp(B) for T < n — p — d, and e t = q>(B)Y t , in terms of the observations Y t , 
the estimate d> A T in (13.2.5b) can be written as 


cp(F)(p(B)Y T 

m A,T ~ - y - 

r z 

Since r 2 > 1, it is seen in general that var [w AT ] < var [w 1 r ) = tr 2 , and in some cases 
var [d> AT ] can be much smaller than tr 2 . For example, in an MA(1) model for z r , the 
variance of d> A T would be (7 2 (1 - 0 2 )/( 1 - 0 2 ( n ~ T+l )) ~ (j 2 (\ — 0 2 ) when n — T is large. 

Significance tests for the presence of an outlier of type AO or IO at the given time T 
can be formulated as a test of m = 0 in either model (13.2.1) or (13.2.2), against a> ± 0. 
The likelihood ratio test criteria can be derived for both situations and essentially take the 
form of the standardized statistics 


CO I 'r TCO A T 

X 1T = — and A at =—— (13.2.6) 

respectively, for IO and AO types. Under the null hypothesis that w = 0, both statistics in 

(13.2.6) will have the standard normal distribution. 

(T) (T) 

For the level-shift-type outlier model Y t = coS f + z t , we have e t = con(B)S t + a, 
and 


n(B)Sl T) 


n(B) 

1 - B 


P, (T) = ji(B)P^ T) 
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with ft(B) = ir(B)/( 1 — B) = 1 — iijBi. So it follows from the estimation results in 
(13.2.4b) and (13.2.5b) that the MLE of co in the level shift model is cb LT = ft* (F)ej / t 2 
with 

ft*(F) = 1 - ii x F - k 2 F 2 - ft n _ T F n ~ T 


and f 2 = 1 + ft 2 + ■■■ + jift 2 _ T . When d = 1 in the ARIMA model, cp(B ) = <p(B)( 1 — B) 
and ft(B) = 9~ 1 (B)cf>(B ), and, as discussed earlier, the results for this situation are the same 
as for the AO in terms of the model for the first differences: 

(1 - B)Y t = ®(1 - B)S r CT) + (1 - B)z, = ®P f (r) + ^-a, 

(pin) 


13.2.3 Iterative Procedure for Outlier Detection 

In practice, the time T of a possible outlier as well as the model parameters are unknown. 
To address the problem of detection of outliers at unknown times, iterative procedures that 
are relatively convenient computationally have been proposed by Chang et al. (1988), Tsay 
(1986), and Chen and Liu (1993) to identify and adjust for the effects of outliers. 

At the first stage of this procedure, the ARIMA model is estimated for the observed 
time series Y t in the usual way, assuming that the series contains no outliers. The residuals 
e t from the model are obtained as e t = 9~ 1 (B)cp(B)Y t = ft(B)Y t , and a 2 = n _1 Y!'t=\ e 2 is 
obtained. Then the statistics, as in (13.2.6), 

a j j — ~ and A a j — — 7 — 

°a & a 

are computed for each time t = 1 , 2 , as well as 

X T = max[max(|i r ,|, |i A? |)] 

where T denotes the time when this maximum occurs. The possibility of an outlier of 
type 10 is identified at time T if A = \ Aj 7 j >c, where c is a prespecified constant with 
typical values for c of 3.0, 3.5, or 4.0. The effect of this IO can be eliminated from the 
residuals by defining e T = e T — cb I T = 0 at T. If X T = \X A T \ > c, the possibility of an AO 
is identified at T, and its impact is estimated by 6> A T as in (13.2.5b). The effect of this AO 
can be removed from the residuals by defining e t = e t — cb A r n(B)P t = e t + cb A T ft t _ T 
for t > T. In either case, a new estimate & 2 is computed from the modified residuals e r . 

If any outliers are identified, the modified residuals e, and modified estimate er 2 , but 
the same parameters ft (B) = 9~ x {B)cp{B ), are used to compute new statistics X Tt and X At . 
The preceding steps are then repeated until all outliers are identified. Suppose that this 
procedure identifies outliers at k time points 7j, T 2 ,..., T k . Then the overall outlier model, 
as in (13.2.3), 


Y t = 

i= i 


(Tj) 9(B) 

- 1 

<p(B) 


(13.2.7) 
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is estimated for the observed series Y t , where Vj(B) = 1 for an AO and Vj(B) = 9(B)/cp(B) 
for an IO at time T y A revised set of residuals 


0~ l (B)v(B) 






and a new a 2 are obtained from this fitted model. The previous steps of the procedure can 
then be repeated with new residuals, until all outliers are identified and a final model of 
the general form of (13.2.7) is estimated. If desired, a modified time series of observations 
in which the effects of the outliers have been removed can be constructed as z, = Y t — 

£•=, oW B ) p( , Tj) - 

The procedure above can be implemented, with few modifications, to any existing 
software capable of estimation of ARIMA and transfer function-noise models. An imple¬ 
mentation in the R package will be demonstrated below. The technique can be a useful 
tool in the identification of potential time series outliers that if undetected could have a 
negative impact on the effectiveness of modeling and estimation. However, there should 
be some cautions concerning the systematic use of such “outlier adjustment” procedures, 
particularly with regard to the overall interpretation of results, the appropriateness of a 
general model specification for “outliers” such as (13.2.7), which treats the outliers as 
deterministic constants, and the possibilities for “overspecification” in the number of 
outliers. Whenever possible, it would always be highly desirable to search for the causes 
or sources of the outliers that may be identified by the foregoing procedure, so that the 
outlying behavior can be better understood and properly accounted for in the analysis. 
Also, although the foregoing procedures should perform well when the series has only a 
few relatively isolated outliers, there could be difficulties due to “masking effects” when 
the series has multiple outliers that occur in patches, especially when they are in the form 
of additive outliers and level shift effects. Modifications to the basic procedure to help 
remedy these difficulties associated with multiple outliers, including joint estimation of 
all identified outlier effects and the model parameters within the iteration stages, were 
proposed by Chen and Liu (1993). 


13.2.4 Examples of Analysis of Outliers 

We consider two numerical examples to illustrate the application of the outlier analysis 
procedures, discussed in the previous sections. For computational convenience, conditional 
least-squares estimation methods are used throughout in these examples. 

Series D. The first example involves Series D, which represents ‘ ‘uncontrolled" viscosity 
readings every hour from a chemical process. In Chapter 7, an AR(1) model (1 — (j>B)z t = 
0 U + a t has been suggested and fitted to this series. In the outlier detection procedure, the 
model is first estimated assuming that no outliers are present, and the results are given in 
Table 13.1(a). Then the AO and IO statistics as in (13.2.6) are computed for each time 
point t, using = 0.08949. Based on a critical value of c = 3.5, we lead to identification 
of an IO of rather large magnitude at time T = 217. The effect of this IO is removed by 
modifying the residual at T, a new estimate d 2 a = 0.08414 is obtained, and new outlier 
statistics are computed using 5 a . At this stage, no outliers are identified. Then, the time 



TABLE 13.1 Outlier Detection and Parameter Estimation Results for Series C and D Examples 

Parameter 0 Outlier 



*0 



& 2 

d>3 


Time 

& 

X 

Type 

(a) Series D 











Cycle 1 

1.269 

0.862 




0.0895 

217 

-1.28 

-4.29 

IO 


(0.258) 

(0.028) 









Final 

1.181 

0.872 

-1.296 



0.0841 






(0.251) 

(0.027) 

(0.292) 








(b) Series C 











Cycle 1 


0.813 




0.0179 

58 

0.76 

5.65 

IO 



(0.038) 





59 

-0.51 

-4.16 

IO 








60 

-0.44 

-3.74 

IO 

Final 


0.851 

0.745 

-0.551 

-0.455 

0.0132 







(0.035) 

(0.116) 

(0.120) 

(0.116) 







0 Standard errors of parameter estimates are in parentheses. 
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series parameters and the outlier parameter w in model (13.2.2), that is, in the model 

* t = ~rjB ie ° +mP ’ T)+a,] 

are estimated simultaneously, and the estimates are given in Table 13.1(a). Repeating the 
outlier detection procedure based on these new parameter estimates and corresponding 
residuals does not reveal any other outliers. Hence, only one extreme 10 is identified, and 
adjusting for this IO does not result in much change in the estimate </; of the time series 
model parameter, but gives about a 6% reduction in the estimate of ctl Several other 
potential outliers, at times r = 29,113,115,171,268, and 272, were also suggested during 
the outlier procedure as having values of the test statistics A slightly greater than 3.0 in 
absolute value, but adjustment for such values did not affect the estimates of the model 
substantially. 


Series C. The second example we consider is Series C, the ‘ ‘uncontrolled’ ’ temperature 
readings every minute in a chemical process. The model previously identified and fitted to 
this series is the ARIMA(1,1,0) model, (1 - <pB){ 1 - B)z t = a t . The estimation results for 
this model obtained assuming there are no outliers are given in Table 13.1(b). Proceeding 
with the sequence of calculations of the outlier test statistics and using the critical value 
of c = 3.5, we first identify an IO at time 58. The residual at time 58 is modified, we 
obtain a new estimate o 2 = 0.01521, and next an IO at time 59 is identified. This residual 
is modified, a new estimate a 1 = 0.01409 is obtained, and then another IO at time 60 is 
indicated. After this, no further outliers are identified. These innovational outliers at times 
58, 59, and 60 are rather apparent in Figure 13.2(a), which shows a time series plot of the 
residuals from the initial model fit before any adjustment for outliers. 

Then the time series outlier model 


(1 - B)Y t 


1 


[<» 1 P f (58) + oj 2 P' t -" + co 3 


,(59) 


1 ~(j)B 


P m + a t ] 


is estimated for the series, and the results are presented in Table 13.1(b). The residuals 
are shown in Figure 13.2(b). No other outliers are detected when the outlier procedure 
is repeated with the new model parameter estimates. In this example we see that ad¬ 
justment for the outliers has a little more effect on the estimate </> of the time series 
parameter than in the previous case, and it reduces the estimate of rrj substantially by 
about 26%. Figure 13.2(b) clearly shows the reduction in variability due to the outlier 
adjustment. 


Calculations Using R. The detection and adjustment for outliers in time series can be 
performed using the TSA package in R. The code needed to do the analysis for Series C 
and D is as follows: 


> library(TSA) 

> ml.C=arima(seriesC,order=c(1,1,0) ) 

> ml.C 

> detectAO(ml.C); detectIO(ml.C) 

> m2.C=arimax(seriesC,order=c(1,1,0),io=c(58,59,60) ) 

> m2 .C 
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(a) Residuals before outlier adjustment 



Time 



FIGURE 13.2 Residuals from the ARIMAO, 1, 0) model fitted to Series C before and after 
adjustment for innovational outliers at t = 58, 59, and 60. 


> ml.D=arima(seriesD,order=c(1,0,0)) 

> ml. D 

> detectAO(ml.D); detectIO(ml.D) 

> m2.D=arimax(seriesD,order=c(l,0,0),io=c(217)) 

> m2 . D 

Figure 13.2 that shows the residuals for Series C before and after the outlier adjustment 
can be reproduced in R as follows: 

> par(mfrow=c(2,1)) 

> plot(ml.C$residuals,ylim=c(-0.5,0.8), 

main='(a) Residuals before outlier adjustment') 

> plot(m2.C$residuals,ylim=c(-0.5,0.8), 

main='(b) Residuals after outlier adjustment') 


13.3 ESTIMATION FOR ARMA MODELS WITH MISSING VALUES 

In some situations in practice, the values of a time series z t may not be observed at equally 
spaced times because there may be ‘ ‘missing values” corresponding to certain time points. 
In this section we discuss briefly the maximum likelihood estimation of parameters in an 
ARIMA(p, d, q ) model for such situations, through consideration of the calculation of the 
exact Gaussian likelihood function for the observed data. It is shown that for series with 
missing observations, the likelihood function can conveniently be constructed using the 
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state-space form of the model and associated Kalman filtering procedures, as discussed in 
Sections 5.5 and 7.4, but modified to accommodate the missing data. These methods for 
evaluation of the likelihood in cases of irregularly spaced observations have been examined 
by Jones (1980), Harvey and Pierse (1984), Ansley and Kohn (1983, 1985), and Wincek 
and Reinsel (1986), among others. We also address briefly the related issue of estimation 
of the missing values in the time series. 


13.3.1 State-Space Model and Kalman Filter with Missing Values 

We suppose n observations are available at integer times t \ < t 2 < ■■■ < t, r not equally 
spaced, from an ARIMA(p, d, q) process, which follows the model </>(£)( 1 — B) d z t = 
6{B)a t . From Section 5.5.1, the process z t has the state-space formulation given by 

Y t = d>T r _ 1 +Vfl f (13.3.1) 


with z t = H Y t = [ 1,0.0] Y r , where Y t is the /--dimensional state vector and r = max(/z + 

d,q + 1). Let A ( - = /,■ — f ( _j denote the time difference between successive observations 
z t | and z t ., i = 2,..., n. By successive substitutions. A, times, on the right-hand side of 
(13.3.1), we obtain 

A,-i 

Y t , = + X *Wa.'.j = <3>*Y ti l + a* (13.3.2) 

j =o 

where d>* = <1> A ' and a* = T! A ' n 1 with 

i tj ^j= 0 'i~J 


cov 



A,-l 

= 2, = ( 7 2 y 0>^'F‘P , <I > 0 

i a / i 

j =0 


Thus, (13.3.2) together with the observation equation z. t = H Y t constitutes a state-space 
model form for the observed time series data z ,, z, ,,z, . 

'i <2 hi 

Therefore, the Kalman filter recursive equations as in (5.5.6) to (5.5.9) can be directly 
employed to obtain the state predictors Y t .\ t . { and their error covariance matrices V ( .| f . { . 
So we can obtain the predictors 


Z h\h-i 


E[z ti \z ti _ i ,...,z h \ = HY lil , i _ i 


(13.3.3) 


for the observations z t . based on the previous observed data and their error variances 


ff l v i 


HV ,IU , H ' = E ^ z h 


z h\h-^ 


(13.3.4) 


readily from the recursive Kalman filtering procedure. More specifically, the updating 
equations (5.5.6) and (5.5.7) in this missing data setting take the form 


with 


*61 U ~ *916-i + K <' ( z h H ^i I h- i ^ 

K, = V„,_ H'[HV fi| ,_ H']- 11 


(13.3.5) 


(13.3.6) 
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while the prediction equations (5.5.8) are given by 


Y. i, = ®*Y. ,. = cjAf, V.d>*' + Ik 

‘jl‘i-1 l 9—115-1 ‘i-ll‘i-1 l 9-ll‘i-l l 1 


(13.3.7) 


with 


= [I — K / H]V, i|fi _ 1 (13.3.8) 

Notice that the calculation of the prediction equations (13.3.7) can be interpreted as com¬ 
putation of the successive one-step-ahead predictions: 


^ti-1+7 19-1 


= ®v,_ 1+ ;_ l|f( _y+ <r a W 


for j = 1,.... A,, without any updating since there are no observations available between 
the time points t j _ l and t ( to provide any additional information for updating. 


Exact Likelihood Function with Missing Values. The exact likelihood for the vector of 
observations z ' = (z t , z t ,..., z t ) is obtained directly from the quantities in (13.3.3) and 
(13.3.4) because the joint density of z can be expressed as the product of the conditional 
densities of the z t ., given z t ,..., z t , for i = 2, .... n. which are Gaussian with condi¬ 
tional means and variances given by (13.3.3) and (13.3.4). Hence, the joint density of the 
observations z can be expressed as 


P( z|0, 0. o 2 a ) = (2 nc 1 a v i ') 1/_ exp 


i=i 


2 °lk 


\\ ti J 


(13.3.9) 


In (13.3.9), the quantities z t ^ t i and a 2 Vj are directly determined from the recursive filtering 
calculations (13.3.5)—(13.3.8). In the case of a stationary ARMA(/j, q) model, the initial 
conditions required to start the filtering procedure can be determined readily (see, for 
example, Jones, (1980) and Section 5.5.2). However, for the nonstationary ARIMA model 
situation, some additional assumptions need to be specified concerning the process and the 
initial conditions. Appropriate methods for such cases have been examined by Ansley and 
Kohn (1985). 

As a simple example to illustrate the missing data methods, consider the stationary 
AR(1) model (1 — 4>B)z l = a t . Then, (13.3.2) directly becomes (see, for example, Reinsel 
and Wincek, 1987) 


z. = </Az, 


Vi 

+ X ^ v 

y=o 


(13.3.10) 


and it is readily determined that 


i. = 


9-i 


and a 1 = a l v ,• = 


<d - 0 0 

1 -0 2 


(13.3.11) 


Hence, the likelihood for the observed data in the first-order autoregressive model with 
missing values is as given in (13.3.9), with these expressions for z t .j t . t and 
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13.3.2 Estimation of Missing Values of an ARMA Process 

A related problem of interest that often arises in the context of missing values for time series 
is that of estimating the missing values. Studies based on interpolation of missing values 
for ARIMA time series from a least-squares viewpoint were performed by Brubacher and 
Tunnicliffe Wilson (1976), Damsleth (1980), and Abraham (1981). Within the framework 
of the state-space formulation, estimates of missing values and their corresponding error 
variances can be derived conveniently through the use of recursive smoothing methods 
associated with the Kalman filter, which were discussed briefly in Section 5.5.3 and are 
described in general terms in Anderson and Moore (1979), for example. These methods 
have been considered more specifically for the ARIMA model with missing values by 
Harvey and Pierse (1984) and by Kohn and Ansley (1986). 

For the special case of a pure autoregressive model, 4>(B)z l = a r some rather simple 
and explicit interpolation results are available. For example, in an ARfp) process with a 
single missing value at time T surrounded by at least p consecutive observed values both 
before and after time T, it is well known (see, for example, Brubacher and Tunnicliffe 
Wilson, 1976) that the optimal interpolation of the missing value z T is given by 



(13.3.12) 


where dj = 4> t 4>i-j, </>o = -1, and d 0 = 1 + £f =1 with E[(z T - z T ) 2 ] = a%d Q 1 = 

<7j?(l + tf) 1 ■ Notice that the value in (13.3.12) can be expressed as z T = z T — 
[4>{F)4>{B)z T /d 0 \, with interpolation error equal to 


( KF)(t>(B)z r 


(13.3.13) 


As one way to establish the result (13.3.12), for convenience of discussion, suppose 
that z T is the only missing value among times t = 1,with /; + 1 < 7' < n — p. 
Using a normal distribution assumption, the optimal (minimum MSE) estimate of z T 
is z T = E[z t |zj, ..., z T _ lf z T+ 1 ,..., z n ], which is also the best linear estimate without the 
normality assumption. Then, by writing the joint density of z = (z ls ..., z n )' in the form 


P( z i, > z r-u z r+u •■■ > z h)p( z :tI z d ••• > z r-t> z r+i« ••• > z «) 


from basic properties of the multivariate normal distribution and its conditional distribu¬ 
tions, it is easily deduced that the estimate z T , the conditional mean, is identical to the 
value of z T that minimizes the “sum-of-squares” function in the exponent of the joint 
multivariate normal density of z. Thus, since z T occurs only in p + 1 terms of the exponent 
sum of squares, this reduces to finding the value of z T to minimize S = a r +l - where 
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a t = z t — ( t ) i z t-i- Now we obtain 


dS 

dz T 



Z^' 




= 2 

= 2 


1 + Z ) Z T + Z M Z ^^T+i-l 

K i= 1 / i =0 V > 

1 + Z ^ z r + Z (z^,w) ^ z r-; + z r+;) 


where </> 0 = — 1. Setting this partial derivative to zero and solving for z T , we find that 
the estimate is given by z T = —d~ 1 Yfj-\ dj( z T-j + z T+j)’ where dj = Yfi- 4>i4 ) i-j an d 
d 0 = 1 + 4> 2 r Notice that the estimate z T can be seen to be determined from the 

solution for z T to the relation (j)(F)4>(B)z T = 0, where </>( B ) = 1 — 4’, B' is the AR(p) 

operator. It can also be established that the error variance of the missing data estimate is 
given by E[(z T - z T ) 2 ] = <r^d~K 

In the general ARMA model situation, Bruce and Martin (1989) and Ljung (1993), 
among others, have noted a close connection between the likelihood function construction 
in the case of missing values and the formulation of the consecutive data model likelihood 
with AOs specified for each time point that corresponds to a missing value. Hence, in effect, 
in such a time series AO model for consecutive data, for given values of the ARMA model 
parameters, the estimate of the outlier effect parameter co corresponds to the interpolation 
error in the missing data situation. For example, in the autoregressive model situation, 
compare the result in (13.3.13) with the result given following (13.2.5b) for the AO model. 
Specifically, since n( B) = 4>(B) in the AR(p) model, e T = 4>(B)Y T and the estimate in 
(13.2.5b) reduces to d> AT = [4>(F)4>(B)Y T ]/d 0 = Y T — Y T = e T , the interpolation error 
given in (13.3.13). Furthermore, the sum-of-squares function in the likelihood (13.3.9) for 
the missing data situation is equal to the sum of squares obtained from a complete set of 
consecutive observations in which an AO has been assumed at each time point where a 
missing value occurs and for which the likelihood is evaluated at the maximum likelihood 
estimates for each of the corresponding AO effect parameters co , for given values of the 
time series model parameters </> and 6. As an illustration, for the simple AR(1) model 
situation with a single isolated missing value at time T, from (13.3.11) the relevant term in 
the missing data sum-of-squares function is 


( z r+1 — </ rz r_t)“ 

l + 4> 2 


[(z T - co) - </>z r _ 1 ] 2 + [z T+1 - 4>(z t - w)] 2 

(z T - 4 > z t - i ) 2 + ( z r+i - 4 > z t ) 2 


(13.3.14) 


where 


4 > 


( Z T -1 + z r+l) — Z T — Z T 


CO — — 


l + 4> 2 
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is the maximum likelihood estimate of the outlier effect or in the AO model (13.2.1), and 
the latter expressions in (13.3.14) represent the sum-of-squares terms in the consecutive 
data situation but with an AO modeled at time T. 

Treating missing data as additive outliers does have an impact on estimation of the 
ARMA model parameters </> and 0, however, and ML estimates of these parameters in the 
missing data case are not identical to estimates that maximize a complete data likelihood 
for which an AO has been assumed at each time point where a missing value occurs. In fact, 
Basu and Reinsel (1996) established that MLEs of </> and 6 for the missing data situation 
are the same as estimates obtained from a model that assumes complete data with an AO 
at each time point where a missing value occurs when the method of restricted maximum 
likelihood estimation (e.g., as discussed in Section 9.5.2) is employed for this latter model 
formulation. We provide the following argument to establish this result. 


Connection Between Exact Likelihood Function for Missing Data Situation and Re¬ 
stricted Likelihood. Let z„ = (z, , z, ,... ,z, )' denote the n x 1 vector of observations 
from the ARMA(p, q) process (jr{B)z t = 9(B)a t with fj = 1 and t n = T. Let z 0 denote 
the T X 1 vector consisting of the observations z n with 0’s inserted for values of times 
where observations are missing, and for convenience arrange as z 0 = (z',0')'. Also, let 
z = (z', z' m )' denote the corresponding vector of (complete) values of the process, where 
z m is the m X 1 vector of the “missing values,” with T = n + m. We can write 


z 0 = X® + z 


(13.3.15) 


where X is a T X m matrix with columns that are “pulse” unit vectors to indicate the 
m missing values, specifically, X = [0, I m ]' under the rearrangement of the data. Thus, 

(13.3.15) can be interpreted as a model that allows for AOs, with parameters or, at all time 
points where a missing value occurs. Note that z n = H'z = H'z 0 where II' = |I n , 0] is the 
nxT matrix whose rows are pulse unit vectors to indicate the n observed values. 

From one perspective, (13.3.15) can be viewed as a “regression model” for the extended 
data vector z 0 with or treated as unknown parameters and ARMA noise process { z t }. (Note 
in fact that ® = —z m by actual definition.)Let er^V* = cov[z] denote the T XT covariance 
matrix of the complete series of values. Then, the form of the restricted likelihood function 
for the extended data vector z 0 under this regression model is given as in (9.5.11) of 
Section 9.5.2, 


L* {(/), 0,<^;z o ) oc (<7*r" /2 IVJ-^IX'V^Xr 172 

X exp | -^-j(z 0 - Xffl/V^Zo - X®) 


(13.3.16) 


where ® = (X / V“ 1 X) _1 X'V“ 1 z 0 . Recall from discussion in Section 9.5.2, however, that 

(13.3.16) has an equivalent representation as the density of the “error contrast vector” 
II'z 0 , since II' is a full rank (T - m) x T matrix such that H'X = 0. Then noting that 
H'z 0 = z„, the observed data vector, expression (13.3.16) also represents the density of z„ 
and hence represents the exact likelihood based on the observed data vector z„, essentially 
by definition. However, we would now like to directly verify the equivalence between 

(13.3.16) and the exact likelihood (density) function of the observed data vector z n . 
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For this, we express the covariance matrix of z = (z ' n , z')' in partitioned form as 


cov[z] = (7 2 V* = a] 


Vn v 12 

V 21 V 22 _ 


where <y 2 \n = cov[zJ in particular. We let V y ", i,j = 1,2, denote the block matrices 
of V" 1 corresponding to the above partitioning of V*. Then using basic results for 
partitioned matrices (e.g., Rao (1965), or Appendix A7.1.1), we can readily derive that 
the restricted likelihood expression in (13.3.16) is the same as the likelihood (density) for 
the observed data vector z n . That is, from results on partitioned matrices, we first have that 

X'v; lx = v 22 = (v 22 - V^V-'Vu)" 1 (13.3.17) 

and IV*| = |V n ||V 22 - V 21 V“ 1 V 12 |. Flence, the determinant factor in (13.3.16) is 
| V* | -1 / 2 |X / V“ 1 X| -1 / 2 = |Vj! I -1 / 2 . Also, the quadratic form in (13.3.16) is expressible as 

z;[v; 1 -v; 1 x(xX 1 x)- 1 x'v; 1 ]zo 

= <tv n - v 12 (v 22 )- 1 v 21 ]z„ = z'V- 1 z„ 

again using a basic result on the inverse of a partitioned matrix. Therefore, expression 
(13.3.16) is equal to 


7 >(z„)oc (tr 2 ) " /2 |V n | 1/2 


exp 


1 z f Y~ l z 

2^" n - 


(13.3.18) 


which, since z n is distributed as normal IV(0, (7“V n ), is the likelihood based on the observed 
data vector z n . 

This equivalence establishes a device for obtaining ML estimates in ARMA models 
with missing values by using an REML estimation routine for the extended data vector z 0 
by setting up a regression component X® that includes an indicator variable (AO term) for 
each missing observation. Estimation of the “extended data” regression model (13.3.15) 
with ARMA errors by the method of REML then results in ML estimates of the ARMA 
model parameters based on the observed data z n . Finally, we note that the GLS estimate of 
co in model (13.3.15) is 

® = (X'V“ 1 X) _1 X / V“ 1 z 0 

= (V 22 )- 1 V 21 z„ = -V 21 V-/z„ = -E[ zjzj (13.3.19) 

so the estimates of the missing values z m are obtained as z m = —60 immediately as 
a by-product of the fitting of the model (13.3.15), with estimation error covariance 
matrix cov[® — co] = cov[z m — z m ] = erjlX'V^X) -1 directly available as well. In addi¬ 
tion, for a complete data vector situation, if there were additive outliers specified at the 
given times corresponding to z m , then model (13.3.15) could be used to estimate ‘ ‘smoothed 
values’ ’ of the observations at all times where an AO is proposed to occur, z m = —60 as 
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given in (13.3.19), and magnitudes of the outliers can be estimated by the differences 
between the observed values and the interpolated values, z m — z m . 


EXERCISES 


13.1. In an analysis (Box and Tiao, 1975) of monthly data Y t on smog-producing oxidant, 
allowance was made for two possible “interventions” 1 1 and I 2 as follows: 

I I : In early 1960, diversion of traffic from the opening of the Golden State Freeway 
and the coming into effect of a law reducing reactive hydrocarbons in gasoline 
sold locally. 

I 2 : In 1966, the coming into effect of a law requiring all new cars to have modified 
engine design. In the case of this intervention, allowance was made for the 
well-known fact that the smog phenomenon is different in summer and winter 
months. 

In a pilot analysis of the data, the following intervention model was used: 


where 


Y t = ®i Zu + 


® 2 


1 -B 


12 


%2t + 


®3 , , ( 1 - 0B){\ -&B' 2 ) 

h, +--- a < 


1 - B 


12 ' 


1 - B 


12 


CO t < Jan. 1960 
i 1 t > Jan. 1960 ^ 2r 


CO t < Jan. 1966 
i 1 t> Jan. 1966 ^ 


CO t < Jan. 1966 
i 1 t > Jan. 1966 


(summer months) (winter months) 


(a) Show that the model allows for the following: 

(1) A possible step change in January 1960 of size aq, possibly produced by I x . 

(2) A “staircase function” of annual step size a) 2 to allow for possible summer 
effect of cumulative influx of cars with new engine design. 

(3) A “staircase function” of annual step size a> 3 to allow for possible winter 
effect of cumulative influx of cars with new engine design. 

(b) Describe what steps you would take to check the representational adequacy of 
the model. 

(c) Assuming you were satisfied with the checking after (b), what conclusions would 
you draw from the following results? (Estimates are shown with their standard 
errors below in parentheses.) 

aq = -1.09 m 2 = -0.25 m 3 = -0.07 0 = -0.24 0 = 0.55 

(±0.13) (±0.07) (±0.06) (±0.03) (±0.04) 

(d) The data for this analysis are listed as Series R in the Collection of Time Series 
in Part Five. Use these data to perform your own intervention analysis. 
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13.2. A general transfer function model of the form 

k 

n = Z 8j\B)a>j(B)Z Jt + cj } -\B)6{B)a t = y t + N t 
7=1 


can include input variables %j, which are themselves time series, and other inputs 
, which are indicator variables. The latter can estimate (and eliminate) the effects 
of interventions of the kind described in Exercise 13.1 and, in particular, are often 

useful in the analysis of sales data. 

(T) 

Let c be an indicator variable that takes the form of a unit pulse at time T, 

that is 


£ 


m 


CO t / T 
| 1 t = T 


For illustration, consider the models 


(i) y, = 

(2 )y, = 

( 3 ) y, = 


co l B 
1 - SB 

co l B 


Z) 


(T) 


(with ® ] = 1.0, 8 = 0.5) 

(with®) = 1.0,5 = 0.5, a> 2 = 0.3) 


+ m 2 B ^ gT) 


1 - SB 1 -B 


a> x B co 2 B \ (T) 

c °o + y^ + — ^' 


(with® 0 = 1.5,®j = —1.0, 
8 = 0.5, ® 2 = -0.5) 


Compute recursively the response y t for each of these models at times i = 
T, T + 1, T + 2,... and comment on their possible usefulness in the estimation and/or 
elimination of effects due to such phenomena as advertising campaigns, promotions, 
and price changes. 

13.3. Figure 13.2 shows the residuals before and after an outlier adjustment for the 
temperature data in Series C. Construct a similar graph for the viscosity data in 
Series D. 

13.4. A time series defined as z t = 1000 log 10 ( H t ), where H t is the price of hogs recorded 
annually by the U.S. Census of Agriculture over the period 1867-1948, was consid¬ 
ered in Exercise 6.6. 

(a) Estimate the parameters of the model identified for this series. Perform diagnostic 
check to determine the adequacy of the fitted model. 

(b) Are additive or innovational outliers present in this series? 

(c) If outliers are found, perform the appropriate adjustments to the basic ARIMA 
model and evaluate the results. 

13.5. Daily air quality measurements in New York, May-September 1973, are available 
in the data file “airquality” in the R datasets package. The file provides data on 
four air quality variables, including the solar radiation measured from 8 a.m. to 12 
noon at Central Park. The solar radiation series has a few missing values. 
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(a) Assuming that an AR(1) is appropriate for the series, derive an expression for 
the conditional expectation of the missing values, given the available data. 

(b) Repeat the derivation in part (a) assuming that an AR(2) model is appropriate for 
the series. 

(c) How would you evaluate the AR assumptions and proceed to develop a suitable 
model for this series? 



14 


MULTIVARIATE TIME SERIES ANALYSIS 


Multivariate time series analysis involves the use of stochastic models to describe and 
analyze the relationships among several time series. While the focus in most of the earlier 
chapters has been on univariate methods, we will now assume that k time series, denoted 
as z lt , z 2t , ..., z kl , are to be analyzed, and we let Z t = (z lt ,, z kt )' denote the time series 
vector at time t. for t = 0, ±1,.... Such multivariate processes are of interest in a variety of 
fields such as economics, business, the social sciences, earth sciences (e.g., meteorology 
and geophysics), environmental sciences, and engineering. For example, in an engineering 
setting, one may be interested in the study of the simultaneous behavior over time of current 
and voltage, or of pressure, temperature, and volume. In economics, we may be interested 
in the variations of interest rates, money supply, unemployment, and so on, while sales 
volume, prices, and advertising expenditures for a particular commodity may be of interest 
in a business context. Multiple time series of this type may be contemporaneously related, 
some series may lead other series, or there may exist feedback relationships between the 
series. 

In the study of multivariate processes, a framework is needed for describing not only 
the properties of the individual series but also the possible cross relationships among the 
series. Two key purposes for analyzing and modeling the series jointly are: 

1. To understand the dynamic relationships over time among the series. 

2. To improve accuracy of forecasts for individual series by utilizing the additional 
information available from the related series in the forecasts for each series. 

With these objectives in mind, we begin this chapter by introducing some basic concepts 
and tools that are needed for modeling multivariate time series. We then describe the vector 
autoregressive, or VAR, models that are widely used in applied work. The properties of 
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Gregory C. Reinsel, and Greta M. Ljung 

©2016 John Wiley & Sons. Inc. Published 2016 by John Wiley & Sons. Inc. 

505 




506 MULTIVARIATE TIME SERIES ANALYSIS 


these models are examined and methods for model identification, parameter estimation, and 
model checking are described. This is followed by a discussion of vector moving average 
and mixed vector autoregressive-moving average models, along with associated modeling 
tools. A brief discussion of nonstationary unit-root models and cointegration among vector 
time series is also included. We find that most of the basic concepts and results from 
univariate time series analysis extend to the multivariate case. However, new problems and 
challenges arise in the modeling of multivariate time series due to the greater complexity 
of models and parametrizations in the vector case. Methods designed to overcome such 
challenges are discussed. For a more detailed coverage of various aspects of multivariate 
time series analysis, see for example, Reinsel (1997), Lutkepohl (2006), and Tsay (2014). 


14.1 STATIONARY MULTIVARIATE TIME SERIES 

Let Z, = ( z lt , ...,z kt )',t = 0,± 1,±2,..., denote a k-dimensional time series vector of 
random variables of interest. The choice of the univariate component time series z jt that 
are included in Z, will depend on the subject matter area and an understanding of the 
system under study, but it is implicit that the component series will be interrelated both 
contemporaneously and across time lags. The representation and modeling of these dynamic 
interrelationships is of main interest in multivariate time series analysis. Similar to the 
univariate case, an important concept in the model representation and analysis, which 
enables useful modeling results to be obtained from a finite sample realization of the series, 
is that of stationarity. 

The vector process {Z t } is (strictly) stationary if the probability distributions of the 
random vectors ( Z t , Z () ,..., Z t ) and ( Z ti+t , Z h+I ,..., Z t +l ) are the same for arbitrary 
times t\, t 2 ,..., t m , all m, and all lags or leads / = 0, ±1, ±2,.... Thus, the probability 
distribution of observations from a stationary vector process is invariant with respect to 
shifts in time. Hence, assuming finite first and second moments exist, for a stationary 
process we must have E[Z t ] = n, constant for all t, where /j = (/4 l , ii 2 ,... ,n k )' i s the 
mean vector of the process. Also, the vectors Z t must have a constant covariance matrix 
for all t, which we denote by 2L = r(0) = E[(Z t — fi)( Z t — n)']. A less stringent definition 
of second-order, or covariance stationarity will be provided below. 

14.1.1 Cross-Covariance and Cross-Correlation Matrices 

For a stationary process { Z t } the covariance between z jt and Zj t+l must depend only on 
the lag /, not on time t. for i,j = 1,..., k, l = 0, ±1, ±2,.... Hence, similar to definitions 
used in Section 12.1.1, we define the cross-covariance between the series z it and z Jt at lag 
/ as 


TijU) = co v[z it ,z j t+l ] = E[(z it - tti)(z j t+l - t-ij)] 
and denote the k X k matrix of cross-covariances at lag / as 


Vn(0 0 - riki O' 

/2l(0 722.0) ■■■ /2fc(0 
YkiO) Y k2 0) ••• YkkV) 


r (i) = E[(z t -n)(z t+l -»)’] = 


(14.1.1) 
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for / = 0, ±1, ±2,.... The corresponding cross-correlations at lag / are 


Pij (l) = corr[z it ,z Jt+l ] = 


rij(» 

{ru(.0)Yjjm 1/2 


with y n ( 0) = var[z, ( ]. Thus, for / = j,p u (l ) = p u (—l) denotes the autocorrelation function 
of the ;'th series z it , and for i # j,Pij(l ) = Pji(~ 0 denotes the cross-correlation function 
between the series z it and Zj t . The kxk cross-correlation matrix p(/) at lag l, with (/, y)th 
element equal to /?,••(/), is given by 


Pd) = V _1/2 r(/)V _1/2 = { Pij (l)} (14.1.2) 


for / = 0, ±1, ±2,..., where V” 1 / 2 = diag{r u (0)~ 1 / 2 ,... ,y kk (0 )- 1/2 }. Note that T(/)' = 
T(—/) and p(l)' = p (—/), since y t j(f) = y yi (—/). In addition, the cross-covariance matrices 
r(/) and cross-correlation matrices p(/) are nonnegative definite, since 


var 



n n 


i=i i=i 


for all positive integers n and all k-dimensional constant vectors b j, ... ,b n . 


14.1.2 Covariance Stationarity 

The definition of stationarity given above is usually referred to as strict or strong stationarity. 
In general, a process { Z t } that possesses finite first and second moments and that satisfies 
the conditions that E[Z t ] = p does not depend on t and E[(Z t — p)(Z l+l — p)'] depends 
only on / is referred to as weak, second-order , or covariance stationary. In this chapter, the 
term stationary will generally be used in this latter sense of weak stationarity. For a sta¬ 
tionary vector process, the cross-covariance and cross-correlation matrices provide useful 
summary information on the dynamic interrelations among the components of the pro¬ 
cess. However, because of the higher dimensionality k > 1 of the vector process, the 
cross-correlation matrices generally have more complicated structures and can be much 
more difficult to interpret than the autocorrelation functions in the univariate case. In 
Sections 4.2-4.4, we will examine the covariance properties implied by vector autoregres¬ 
sive, moving average, and mixed autoregressive-moving average models. 


14.1.3 Vector White Noise Process 

The simplest example of a stationary vector process is the vector white noise process, 
which plays a fundamental role as a building block for general vector processes. The 
vector white noise process is defined as a sequence of random vectors ... , a l ,..., a t ,... 
with a, = (o lr ,..., a kt Y, such that E[a t ) = 0, E[a,a f t ] = 2, and E[a t a' t+/ ] = 0, for / / 0. 
Hence, its covariance matrices r(/) are given by 


r(/) = E[a,a' t+l ] = 


2 for 1=0 
0 for / # 0 


(14.1.3) 
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The k X k covariance matrix £ is assumed to be positive definite, since the dimension k of 
the process could be reduced otherwise. Sometimes, additional properties will be assumed 
for the a t , such as normality or mutual independence over different time periods. 

14.1.4 Moving Average Representation of a Stationary Vector Process 

A multivariate generalization of Wold’s theorem states that if {Z,} is a purely nondeter- 
ministic (i.e., Z, does not contain a purely deterministic component process whose future 
values can be perfectly predicted from the past values) stationary process with mean vector 
H, then Z t can be represented as an infinite vector moving average (MA) process, 

00 

Z t = ii + ^'V J a t _ J =it + '9(B)a t *P 0 = I (14.1.4) 

j =o 

where *F(-B) = Y,JLo 'I 1 , k> J is a kx k matrix in the backshift operator B such that B J a t = 
a t _j and the k X k coefficient matrices 'Vj satisfy the condition YfjLo 11^/II 2 < °o, where 
li'Pj II denotes the norm of *Py. The a t form a vector white noise process with mean 0 and 
covariances given by (14.1.3). The covariance matrix of Z t is then given by 

00 

Cov(Z,) = 

y=o 

The Wold representation in (14.1.4) is obtained by defining a t as the error a t = 
Z t — Z ; _|(l) of the best (i.e., minimum mean square error) one-step-ahead linear pre¬ 
dictor Z t _ | (1) of Z t based on the infinite past Z r _ h Z ( _ 2 ,.... Thus, the a, are mutually 
uncorrelated by construction since a, is uncorrelated with Z t _j for all j > 1 and, hence, 
is uncorrelated with a t _j for all j > 1, and the a t have a constant covariance matrix by 
stationarity of the process {Z,}. The best one-step-ahead linear predictor can be expressed 
as 

00 00 

Z t - i(l) = V + 2 V 1< Z »-1 - Zt-j- id)} = B + X 'Vjat.j 
1=1 1=1 

Consequently, the coefficient matrices *P ; in (14.1.4) have the interpretation of the linear 
regression matrices of Z t on the a,_j in that *I' ; = cov[Z r , a t _ J |£ _l . 

In what follows, we will assume that *P(R) can be represented (at least approximately, in 
practice) as the product <I> _1 (B)Q(B), where <I>( B ) and 0( B) are finite autoregressive and 
moving average matrix polynomials of orders p and q, respectively. This leads to a class of 
linear models for vector time series Z t defined by a relation of the form <I>( B)(Z t — p) = 
0(B) a,, or 


p q 

(Z, - B) ~ X %{Z t _j -p) = a t ~Y J GjOt-j (14.1.5) 

1=1 1=1 

A process {Z,} is referred to as a vector autoregressive-moving average, or VARMA(p, q), 
process if it satisfies the relations (14.1.5) for a given white noise sequence {a t }. 

We begin the discussion of this class of vector models by examining the special case 
when q is zero so that the process follows a pure vector autoregressive model of order p. 
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The discussion will focus on time-domain methods for analyzing vector time series and 
spectral methods will not be used. However, a brief summary of the spectral characteristics 
of stationary vector processes is provided in Appendix A 14.1. 


14.2 VECTOR AUTOREGRESSIVE MODELS 

Among multivariate time series models, vector autoregressive models are the most widely 
used in practice. A major reason for this is their similarity to ordinary regression models 
and the relative ease of fitting these models to actual time series. For example, the param¬ 
eters can be estimated using least-squares methods that yield closed-form expressions for 
the estimates. Other methods from multivariate regression analysis can be used at other 
steps of the analysis. Vector autoregressive models are widely used in econometrics, for 
example, to describe the dynamic behavior of economic and financial time series and to 
produce forecasts. This section examines the properties of vector autoregressive models 
and describes methods for order specification, parameter estimation, and model checking 
that can be used to develop these models in practice. 

14.2.1 VAR(p) Model 

A vector autoregressive model of order p, or VAR(p) model, is defined as 

®- n) = a, 

where <&(£) = I — <J>j ^ — ® 2 B 2 -<1 > p B p , d> ( is a k x k parameter matrix, and a t is 

a white noise sequence with mean 0 and covariance matrix 2. The model can equivalently 
be written as 


p 

(Z, - n) = Yj *j(Z'-j - H) + a, (14.2.1) 

j =i 

The behavior of the process is determined by the roots of the determinantal equation 
det {<&(!})} = 0. In particular, the process is stationary if all the roots of this equation are 
greater than one in absolute value; that is, lie outside the unit circle (e.g., Reinsel,1997, 
Chapter 2). When this condition is met, {Z ( } has the infinite moving average representation 

00 

Z, = p + £ VjUt.j (14.2.2) 

j =o 

or Z, = p + 'F( B)a t , where 'F( B) = <&“*(£) and the coefficient matrices satisfy the 
condition 0 || < oo. Then, since d>(l?) v P(iJ) = I, the coefficient matrices can be 

calculated recursively from 


^ + • • • + j_ p (14.2.3) 

with *F 0 = I and 'P ; = 0, for j < 0. 

The moving average representation (14.2.2) is useful for examining the covariance 
properties of the process and it has a number of other applications. As in the univariate 
case, it is useful for studying forecast errors when the VAR( p) model is used for forecasting. 
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It is also used in impulse response analysis to determine how current or future values of 
the series are impacted by past changes or “shocks” to the system. The coefficient matrix 
*Pj shows the expected impact of a past shock a,_j on the current value Z,. The response of a 
specific variable to a shock in another variable is often of interest in applied work. However, 
since the components of a,_j are typically correlated, the individual elements of the 'F ; 
can be difficult to interpret. To aid the interpretation, the covariance matrix 2 of a t can 
be diagonalized using a Cholesky decomposition 2 = LL ', where L is a lower triangular 
matrix with positive diagonal elements. Then, letting b t = L~ l a t , we have Co v(b t ) = l k , 
and the model can be rewritten as 


z, = n+2>;v, 

7=0 

where V F* = L and *P* = *P jL for j > 0. The matrices V F* are called the impulse response 
weights with respect to the orthogonal innovations b t . Since L is a lower triangular matrix, 
the ordering of the variables will, however, matter in this case. For further discussion and 
for applications of impulse response analysis, see Liitkepohl (2006, Chapter 2) and Tsay 
(2014, Chapter 2). 


Reduced and Structural Forms. It is sometimes useful to express the VAR(p) process in 
(14.2.1) in the following slightly different form. Since the matrix 2 = E[a t a' t ] is assumed 
to be positive definite, there exists a lower triangular matrix <IT with ones on the diagonal 
such that = 2 # is a diagonal matrix with positive diagonal elements. Hence, by 

premultiplying (14.2.1) by ®* we obtain the following representation: 

p 

®^(Z f ~p)=Yj ~R) + h t (14-2.4) 

7=1 

where «fC = d> ( j<I> / and b, = with Cov[6 f ] = 2 # . This model displays the concurrent 
dependence among the components of Z r through the lower triangular matrix ®q and is 
sometimes referred to as the structural form of the VAR(p) model. The model (14.2.1) that 
includes the concurrent relationships in the covariance matrix 2 of the errors and does not 
show them explicitly is referred to as the standard or reduced form of the VAR(p) model. 
Note that a diagonalizing transformation of this type was already used in the impulse 
response analysis described above, where the innovations bf s were further normalized to 
have unit variance. 


14.2.2 Moment Equations and Yule-Walker Estimates 

For the VAR(p) model, the covariance matrices 1(1) = Cov(Z ( , Z (+/ ) = Cov(Z,_ ; , Zf) = 
E[(Z t _i — fi)(Z t — pi] satisfy the matrix equations 

p 

r(/) = £r d-jWj 

7=1 


(14.2.5) 
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for / = 1,2, with r(0) = ^ p ._\ r(—+ 2. This result is readily derived using 

(14.2.1), noting that E[(Z tl — p)a f _.] = 0, for j < l. The matrix equations (14.2.5) are 
commonly referred to as the multivariate Yule-Walker equations for the VAR(p) model. 
For / = 0,... ,p, these equations can be used to solve for the IT/) simultaneously in terms 
of the AR parameter matrices <Iv and 2. 

Conversely, the AR coefficient matrices d> ;) and 2 can also be determined 

from the T’s by first solving the Yule-Walker equations, for / = 1,... ,p, to obtain the 
parameters <I> ; . These equations can be written in matrix form as = F,^, with 

solution = r-'r (/)) , where 

<K) = [®i. -. V r ( P) = trd)',... ,r(p)Y 

and r is a kp X kp matrix with (J,j) th block of elements equal to r(i — j). Once the d> ; 
are determined from this, 2 can be obtained as 

2 = r ( o) - X F(-,,d>; EE r ( o) - v[ p <\> (p) = r ( 0) - *' (p) r p <i> (p) 

7=1 

In practical applications, these results can be used to derive Yule-Walker estimates of the 
parameters in the VAR (p) model by replacing the variance and covariance matrices by their 
estimates. 


14.2.3 Special Case: VAR(l) Model 

To examine the properties of VAR models in more detail, we will consider the VAR(l) 
model. 


Z, = d>Z,_i + a, 

where the mean vector p is assumed to be zero for convenience. For k = 2, we have the 
bivariate VAR(l) process 


Z,= 


‘/’ll ‘/’12 

021 022 . 


z 


1-1 


+ 


«1 1 
?21. 


or equivalently 


Z U ~ 0ll~l,r—1 + 012^2,1—1 + a lt 

z 2t = 021 ~l,r—1 + 022 z 2,l-1 + a 2t 

where <j) t j and (f> 02 reflect the dependence of each component on its own past. The parameter 
</> 12 shows the dependence of z lr on z 2t ~\ in the presence of ~i t _ lt while </> 2 i shows the 
dependence of z 2t on Z[ t _\ in the presence of z 2t _ 1 . Thus, if <p l2 # 0 and </> 21 # 0, then 
there is a feedback relationship between the two components. On the other hand, if the off- 
diagonal elements of the parameter matrix O are zero, that is, </> 12 = </> 21 = 0, then z lr and 
z 2t are not dynamically correlated. However, they are still contemporaneously correlated 
unless 2 is a diagonal matrix. 
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Relationship to Transfer Function Model. If = 0, but </>., } f 0, then z lr does not 
depend on past values of z 2 t but z 2t depends on past values of Z\ t . A transfer function 
relationship then exists with Z\ t acting as an input variable and Z2, as an output variable. 
However, unless z lt is uncorrelated with a 2t . the resulting model is not in the standard 
transfer function form discussed in Chapter 12. To obtain the standard transfer function 
model, we let a lt = b u and a 2t = pa lt + b lt , where p is the regression coefficient of a 2l on 
a lt . Under normality, the error term b 2t is then independent of a lt and hence of b u . The 
unidirectional transfer function model is obtained by rewriting the equations for z lt and z 2t 
above in terms of the orthogonal innovations b u and b 2t - This yields 

(1 - </ > 22-®) z 2 1 = ifi + (021 _ + ^2r 


where the input variable z lf does not depend on the noise term b 2t . 

Hence, the bivariate transfer function model emerges as a special case of the bivariate 
AR model, in which a unidirectional relationship exists between the variables. In general, 
for a VAR(l) model in higher dimensions, k > 2, if the k series can be arranged so that the 
matrix O is lower triangular, then the VAR(l) model can also be expressed in the form of 
unidirectional transfer function equations. 


Stationarity Conditions for VAR(l) Model. The VAR(l) process is stationary if the roots 
of det{I — <I> B } = 0 exceed one in absolute value. Since det{I — <I>6} = 0 if and only 
if det { 21 — <I>[ = 0 with A = l/B, it follows that the stationarity condition for the AR(1) 
model is equivalent to requiring that the eigenvalues of <t> be less than one in absolute value. 
When this condition is met, the process has the convergent infinite MA representation 
(14.2.2) with MA coefficient matrices 'IL = fl> 7 , since from (14.2.3) the *F / now satisfy 

T, = <I)T ;I = fl>'4' 0 

To look at the stationarity for a k-dimensional VAR(l) model further, we note that for 
arbitrary n > 0, by t + n successive substitutions in the right-hand side of Z t = <I>Z l _ l + a t 
we obtain 


t+n 

Z t = J j &a t _ j + <S> t+n+l Z_ n _ t 
2=0 

Hence, provided that all eigenvalues of <I> are less than one in absolute value, as 
n —► oo this will converge to the infinite MA representation Z t = a t _j, with 

2“ o \\<1> J || < oo, which is stationary. For example, suppose that <I> has k distinct eigen¬ 
values A l ,..., A k , so there is a k X k nonsingular matrix P such that P 1 <I>P = A = 
diag(![,... ,A k ). Then = PAP -1 and tp-' = PA ; P _1 , where A J = diag(2j,..., 2^), so 
when all \A t \ < 1, YjJLo Ill'll < 00 since then ll^ll < °°- 

Moment Equations. For the VAR(l) model, the matrix Yule-Walker equations (14.2.5) 
simplify to 


r(/) = r(/ - 1)0' for / > 1 
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so r(l) = r«))(I»', in particular, with 

r(0) = r(-i)o' + z = cDrioia*' + 2 

Hence, d) 7 can be determined from r(0) and FT 1) as d*' = I70) _l r(l) and also Y(l) = 
r(0)O ; . This last relation illustrates that the behavior of all correlations in p(l), ob¬ 
tained using (14.1.2), will be controlled by the behavior of the A 1 , i = 1 where 

A 1 ,...,A f . are the eigenvalues of <I>. and shows that even the simple VAR(l) model is 
capable of fairly general correlation structures (e.g., mixtures of exponential decaying and 
damping sinusoidal terms) for dimensions k > 1. (For more details, see Reinsel, 1997, 
Section 2.2.3). 

14.2.4 Numerical Example 

Consider the bivariate (k = 2) AR(1) model (I — <I>fi)Z ( = a, with 



' 0.8 

0.7' 


'4 r 

d> = 

-0.4 

0.6 

2 = 

1 2 


The roots of det{21 — d>} = A 2 — 1.4 A + 0.76 = 0 are A = 0.7 ± 0.5196/, with absolute 
value equal to (0.76) 1 / 2 ; hence, the AR(1) model is stationary. Since the roots are complex, 
the correlations of this AR(1) process will exhibit damped sinusoidal behavior. The co- 
variance matrix 170) is determined by solving the linear equations 170) — t&riOld*' = 2. 
Together with 17/) = 17/ — 1 )fl> / . these lead to the covariance matrices 

8.315' 

5.931 
8.381' 

2.336 
1.115 
4.453 


170) = 

18.536 

-1.500 

r(i) = 

13.779 

-1.500 8.884 

5.019 

172) = 

'5.203 

-10.5001 

173) = 

—3.188 

[8.166 1.551 J 

7.619 

174) = 

-8.417 -3.754" 

4.460 -4.449 

175) = 

-9.361 

0.453 


The corresponding correlation matrices are obtained from p(l) = V - 1 / 2 r(/)V -1 / 2 , where 
V -1 / 2 = diag(18.536 -1 / 2 ,8.884 -1 / 2 ). The autocorrelations and cross-correlations of this 
process are displayed up to 18 lags in Figure 14.1. We note that the correlation patterns are 
rather involved and correlations do not die out very quickly. The coefficients 'Ey = <t> ; , j > 
1, in the infinite MA representation for this AR(1) process are 


T, = 

0.80 

0.70' 

-0.40 

0.60 


' -0.42 

0.43 

*E 4 = 




-0.25 

-0.54 


*2 = 

0.36 

-0.56 

0.98' 

0.08 

*5 = 

-0.51 

0.02 

-0.03 

-0.50 


*3 

*6 


-0.10 

0.84 

-0.48 

-0.34 

-0.39 

-0.38 

0.22 

-0.28 


So the elements of the 'Ey matrices are also persistent and exhibit damped sinusoidal 
behavior similar to that of the correlations. 



514 MULTIVARIATE TIME SERIES ANALYSIS 


0.5 

0 

- 0.5 

0 5 10 15 


0.5 

0 

- 0.5 

0 5 10 15 

(a) 



FIGURE 14.1 Theoretical autocorrelations and cross-correlations, p (J .(/), for the bivariate VAR( 1) 
process example: (a) autocorrelations /?,,(/) and p 22 (l) and (b) cross-correlations p [2 (l). 


Finally, since det{21 — <&} = 2 2 — 1.4/1 + 0.76 = 0, it follows from Reinsel (1997, 
Section 2.2.4) that each individual series z it has a univariate ARMA(2, 1) model rep¬ 
resentation as (1 — 1.46 + 0.76 Br)z it = (1 — thB)e it , a 1 = var[e.- r ], where r\. and a 1 are 
readily determined. For a k-dimensional VAR (p) model, it can be shown that each indi¬ 
vidual component z it follows a univariate ARMA of maximum order (kp, (k - 1 )p). The 
order can be much less if the AR and MA polynomials have common factors (e.g., Wei, 
2006, Chapter 16). 

Computations in R. The covariance matrices T(/) and the matrices shown above can be 
reproduced using the MTS package in R as follows: 

> library(MTS) 

> phil=matrix(c(0.8,-0.4,0.7,0.6),2,2) 

> sig=matrix(c(4,1,1,2),2,2) 

> eigen(phil) 

> ml=VARMAcov(Phi=phil, Sigma=sig, lag=5) 

> names(ml) 

[1] "autocov" "ccm" 

> autocov=t(ml$autocov) 
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> m2=PSIwgt(Phi=phil) 

> names(m2) 

[1] "psi.weight" "irf" 

> m2$psi.weight 

The command VARMAcov() computes the covariance and cross-correlation matrices up to 
12 lags by default. These matrices need to be transposed using the command t() since MTS 
defines the lag / covariance matrix T(/) as E[(Z t —p)(Z— p) ], whereas the definition 
E[(Zp)(Zp) ] is used in this chapter. Transposing the matrices makes the results 
from R consistent with our definition. The command eigen(phil) included in the code 
gives the eigenvalues of the matrix <I>. 


14.2.5 Initial Model Building and Least-Squares Estimation for YAR Models 

Given an observed vector time series Z 1; Z 2 ,..., Z N of length N from a multivariate 
process, the development of an appropriate VAR model for the series can be performed 
iteratively using a three-stage procedure of model specification, parameter estimation, and 
diagnostic checking. In the VAR case, the model specification involves choosing a suit¬ 
able value for the order p. Some useful tools at this stage include the sample covariance 
and correlation matrices described below and the sample partial autoregression matrices 
discussed, for example, by Tiao and Box (1981). The latter quantities are analogous to the 
partial autocorrelations used in the univariate case and are estimated as the last autoregres¬ 
sive matrix, O m , in a VAR(m) model with m = 1,2,.... The estimates d» (H can be derived 
from the Yule-Walker equations or by least-squares estimation of the parameter matrices. 
Statistical tests are used to determine the significance of the estimates for each value 
of m. The partial autoregression matrices are zero for all lags greater than p and are thus 
particularly useful for identifying the autoregressive model order. Additional methods for 
model selection include the use of information criteria such as AIC, BIC, and HC, as well 
as methods based on canonical correlation analysis described later in this chapter. 


Sample Covariance and Correlation Matrices. Given an observed time series, the sample 
covariance matrix of the Z, at lag / is defined as 


f(/) = C(/) Z)(Z, +/ -zy I = 0,1,2,. 


N-l 


1 = 1 


(14.2.6) 


where Z = (zj,... ,z k )' = N 1 Z, is the sample mean vector, which is a natural 
estimator of the process mean vector p = E[Z t ] in the stationary case. In particular, 
f(0) = C(0) = N~ l 2£j(Z ( — Z)(Z r — ZY is the sample covariance matrix of the Z r 
The (/, y)th element of r(/) is given by 

N—l 

Yij(l) = Cjj(l) = — Y,(z it -%)(z jJ+ i-Zj) 
t= l 


The sample cross-correlations are defined as 


Pijd) = r u d) = 


{Cii(0) c yy(0)} 1 / 2 


ij = 1,... ,k 
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For a stationary series, the are sample estimates of the theoretical py(/). The 

asymptotic sampling properties of sample correlations Pij(l) were discussed earlier in 
Section 12.1.3. The expressions for the asymptotic variances and covariances of the esti¬ 
mates are complicated but simplify in certain cases. For example, in the special case where 
Z t is a white noise process, the results give var [p f -(/)] ~ 1 /(N — 1). 

The sample cross-correlation matrices are important tools for the initial specification 
of a model for the series Z v They are particularly useful in the model specification for 
a low-order pure vector moving average model, which has the property that p (J (/) = 0 
for all 1 > q, as discussed in Section 14.3 below. Flowever, similar to the univariate case, 
a slowly decaying pattern in the estimated autocorrelation and cross-correlation matrices 
would indicate that autoregressive terms are needed. 

Estimation of the Partial Autoregression Matrices. Consider the vector autoregressive 
model of order m, Z, = 8 + ^"Li + a v where 8 = (1 — fl>, —-ac¬ 

counts for the non-zero mean vector. Estimates of the partial autoregressive matrices can 
be obtained from the Yule-Walker equations in (14.2.5) as 

( ^(m) = [3*1 m> ^ m ^(m) 

The estimate of the error covariance matrix estimate is £ m = f (0) — Y.'j= i f(— y)O jm . The 

estimation is performed for m = 1,2,..., yielding a sequence of estimates ( l> mm of the 
last parameter matrix in the VAR(w) model. These matrices are referred to as partial 
autoregression matrices by Tiao and Box (1981). 

An asymptotically equivalent procedure is to estimate the partial autoregression matrices 
using multivariate linear least-squares (LS) estimation described, for example, by Johnson 
and Wichern (2007). Using this approach, the components of Z, are regressed on the lagged 
vector values Z ( _j,.... Z t _ m , by first writing the VAR (m) model in regression form as 

m 

Z, = <5 + £ <t>jZ t _j + a, =8 + <b[ m) X, + a, (14.2.7) 

7=1 

with X, = (Z 1 j,.... Z' t m )'. The LS estimates for the AR parameters are then given by 

<*>(,„, = [6 lm ,... ,6 mm ]' = (X'X)-'X'Z (14.2.8) 

where the matrices Z and X, respectively, have typical rows (Z t — Z (i) . f and 

[(Zt -1 - Z m Y ,..., (Z,_ m - Z (m) )'] t = m + l,...,N 

with Z (i> = n~ l Yi^= m +i Z t _j and n = N — m. The estimate of the error covariance matrix 
X is 

± m = [n-(km +!)]-%„ (14.2.9) 


N 

t=m +1 


where 



VECTOR AUTOREGRESSIVE MODELS 517 


is the residual sum-of-squares matrix and 

m 

a, = (Z, - Z (0) ) - ^ ~ z q i)) 

j= i 

are the residual vectors. These LS estimators ®. are also the conditional maximum likeli¬ 
hood (ML) estimators under the normality assumption. Asymptotic distribution theory for 
the LS estimators in the stationary VAR model was provided by Hannan (1970, Chapter 6). 
Under a stationary VAR(w) model, the distribution of vec|d» fmj ] is approximately multivari¬ 
ate normal with mean vector vec[<I> ((H) ] and covariance matrix estimated by ® (X'X) -1 , 
where ® denotes the Kronecker product of and (X'X) -1 . 


Sequential Likelihood Ratio Tests. The estimation of the partial autoregression matrices 
is supplemented by likelihood ratio tests that are applied sequentially to help determine the 
model order p. (e.g„ see Tiao and Box (1981) and Reinsel (1997, Chapter 4)). Thus, after 
fitting a VAR(m) model, we test the null hypothesis H 0 : = 0 against the alternative 

# 0, using the likelihood ratio (LR) statistic 




(n — mk — — 

) In 

\ ISJ 

V 2; 

1 



(14.2.10) 


where S m is the residual sum-of-squares matrix defined above, and n = N — m — 1 is the 
effective number of observations assuming that the model includes a constant term. For 
large n. when H 0 : O mm = 0 is true, the statistic M m has an approximate / 2 distribution 
with k 2 degrees of freedom, and we reject H (l for large values of M m . The LR test statistic 
in (14.2.10) is asymptotically equivalent to a Wald statistic formed in terms of the LS 
estimator 6 mm of O mm . 

This procedure is a natural extension of the use of the sample PACF 4> mm for identification 
of the order of an AR model in the univariate case as described in Section 6.2. However, 
unlike the univariate case, the partial autoregression matrices are not partial autocorrelation 
matrices (or correlations of any kind) in the vector case. Similar tests based on the sample 
partial autocorrelation matrices, whose elements are proper correlation coefficients, are 
described by Reinsel (1997, Chapter 4) and Wei (2006, Chapter 16). 


Use of Information Criteria. Model selection criteria such as AIC, BIC, and HQ can also 
be employed for model specification. Here, AIC represents Akaike’s information criterion 
(Akaike, 1974a), BIC is the Bayesian information criterion due to Schwarz (1978), and 
HQ is the model selection criterion proposed by Hannan and Quinn (1979); see also Quinn 
(1980). These criteria are likelihood based and include under normality the determinant 
of the innovations covariance matrix that reflects the goodness of fit of the model. A 
second term is a function of the number of fitted parameters and penalizes models that are 
unnecessarily complex. For the VAR model, we have 

AlC m = ln{|f;j}+2 mk 2 /N 
BIC m = ln{ |2) m |} + mk 2 ln(iV)/ N 
HQ m = ln{ \t m \ } + 2mk 2 ln(ln (N))/N 
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where N is the sample size, m is the VAR order, and D m is the corresponding ML residual 
covariance matrix estimate of D. It can be seen that B1C imposes a greater ‘ ‘penalty factor’ ’ 
for the number of estimated parameters than does AIC, while HQ is intermediate between 
AIC and BIC. Other similar measures include the final prediction error (FPE) criterion 
suggested by Akaike (1971). These criteria can be used to compare models fitted using 
maximum likelihood and the model that gives the lowest value for a given criterion would 
be selected. For a discussion of the properties and performance of different model selection 
criteria, see, for example, Quinn (1980) and Liitkepohl (2006). 

14.2.6 Parameter Estimation and Model Checking 

Parameter Estimation. With the order of the VAR model specified, the model parame¬ 
ters can be estimated using the least-squares procedure described above. For a stationary 
process, the Yule-Walker estimates are asymptotically equivalent to the least-squares esti¬ 
mates. However, when the process is nonstationary or near nonstationary, it is known that 
the least-squares estimator still performs consistently, whereas the Yule-Walker estimator 
may have a considerable bias. Hence, the least-squares method is generally to be pre¬ 
ferred (e.g., Reinsel, 1997, Section 4.4). Under the normality assumption, the least-squares 
estimates are equivalent to conditional maximum likelihood estimates. Exact maximum 
likelihood estimates can be derived using the unconditional likelihood function described 
for VARMA models in Section 14.4.5. However, use of the conditional likelihood function 
simplifies the calculations and is often adequate for VAR models in practice. 

Model Checking. Model diagnostics of the estimated VAR model are primarily based on 
examination of the residual vectors a t from the estimated model and their sample covariance 
matrices. The residuals a, are calculated from (14.2.1) with the parameters replaced by their 
estimates (I> ; . Useful diagnostic checks include plots of the residuals against time and/or 
against other variables, and detailed examination of the cross-correlation matrices of the 
residuals. Approximate two-standard-error limits can be imposed to assess the statistical 
significance of the residual correlations. 

In addition, overall portmanteau or ‘ ‘goodness-of-fit’ ’ tests based on the residual co- 
variance matrices at several lags can be employed for model checking; see, for example, 
Hosking (1980), Li and McLeod (1981), Poskitt and Tremayne (1982), and Ali (1989). 
Specifically, using s lags, an overall goodness-of-fit test statistic, analogous to that pro¬ 
posed by Ljung and Box (1978) for the univariate case, is given by 

Qs = N 2 ^( 7 V - 0 “ 1 tr[r a (/)±“ 1 r a (/)'i: _1 ] (14.2.11) 

1=1 

where 

N—I 

r*(/) = at 1 £a,a; +/ / = o,i,...,s 

t =1 

with r a (0) ~ Y. Under the null hypothesis of model adequacy, the test statistic Q s is 
approximately distributed as chi-squared with k 2 (s — p) degrees of freedom. The fitted 
model is rejected as inadequate for large values of Q s . Mahdi and McLeod (2012) extended 
the portmanteau test of Pena and Rodriguez (2002, 2006) described in Chapter 8 to the 
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multivariate case and proposed a test based on the determinant of the autocorrelation matrix 
of the multivariate residuals. Alternative tests such as score or Lagrange multiplier (LM) 
tests have also been proposed in the literature. For a discussion of the LM tests and their 
relationship to portmanteau tests, see, for example, Reinsel (1997) and Liitkepohl (2006). 

14.2.7 An Empirical Example 

To illustrate the model building procedure for a vector process outlined above, we consider 
the bivariate time series of U.S. fixed investment and change in business inventories. 
These data are quarterly, seasonally adjusted, and are given in Liitkepohl (2006). The 
fixed investment data for the time period 1947 to 1971 are shown in Figure 14.2, and the 
changes in business inventories series for the same period are shown in Figure 14.3(b). 
Since the investment series is clearly nonstationary, the first differences of this series, 
which are displayed in Figure 14.3(a), are considered as series Z\ t together with the change 
in business inventories as series z 2t , resulting in N =99 quarterly observations. 

Sample cross-correlation matrices of the series Z, = (z lf , z 2t )' for lags 1 through 12 are 
shown in Table 14.1, and these sample autocorrelations and cross-correlations p,j{l) are 
also displayed up to 18 lags in Figure 14.4. Included in Figure 14.4 are the rough guidelines 
of the two-standard-error limits ±2 /\[n ~ ±0.2, which are appropriate for the /), y (/) from 
a vector white noise process as noted in Section 14.2.5. These sample correlations show 
exponentially decaying and damped sinusoidal behavior as a function of lag /, indicative 
of autoregressive dependence structure in the series. 

To select a suitable model, we apply the sequential likelihood ratio test and the 
three information criteria discussed above to the data. The calculations are performed 
using the MTS package in R and the results are summarized in Table 14.2. We note that the 
three criteria AIC m , BIC m , and HQ m all attain a minimum at m = 2. The likelihood ratio 
statistic M m also supports the value m = 2, although a slight discrepancy occurs at m = 4. 
These results therefore indicate that, among pure autoregressive models, a second-order 
VAR(2) model may be the most appropriate for these data. 



1950 1955 1960 1965 1970 

fixed investment series 


FIGURE 14.2 Quarterly (seasonally adjusted) U.S. fixed investment data for 1947 through 1971. 
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1950 1955 1960 1965 1970 


(a) 



1950 1955 1960 1965 1970 


(b) 

FIGURE 14.3 Quarterly (seasonally adjusted) first differences of U.S. fixed investment data and 
changes in business inventories data (in billions) for the period 1947 through 1971: (a) z lt : first 
differences of investment series, z u = z* ( — z* ( | ; and (b) z 2l : changes in business inventories series. 
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TABLE 14.1 Sample Correlation Matrices p(/) for the Bivariate Quarterly Series of First 
Differences of U.S. Fixed Investment and U.S. Changes in Business Inventories 


/ 


1 


2 


3 

4 


5 


6 


m 

0.47 

0.27 

0.10 

0.35 

-0.12 

0.29 

-0.31 

0.27 

-0.30 

0.19 

-0.21 

0.04 


-0.06 

0.68 

-0.33 

0.50 

-0.29 

0.32 

-0.21 

0.07 

-0.10 

0.07 

0.10 

0.01 

i 


7 


8 


9 

10 


11 


12 


m 

-0.14 

-0.04 

-0.09 

-0.11 

0.13 

-0.03 

0.19 

0.07 

0.13 

0.12 

0.02 

0.20 


0.15 

0.04 

0.20 

0.05 

0.12 

0.05 

0.05 

0.11 

0.01 

0.09 

-0.04 

0.05 




(b) 

FIGURE 14.4 Sample auto- and cross-correlations /5 y (/) for the bivariate series of first differences 
of U.S. fixed investment and U.S. changes in business inventories: (a) sample autocorrelations p n (l) 
and p 22 U) and (b) sample cross-correlations p p (/). 
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TABLE 14.2 Order Selection Statistics for the U.S. Business Investment and Inventories Data 


m (VAR Order) 

AIC 

m 

BIC 

m 

HQ 

M 

m 

p -Value 

0 

5.539 

5.539 

5.539 

0.000 

0.000 

1 

4.723 

4.828 

4.766 

73.997 

0.000 

2 

4.597 

4.807 

4.682 

16.652 

0.002 

3 

4.659 

4.974 

4.786 

1.483 

0.830 

4 

4.614 

5.033 

4.784 

9.628 

0.047 

5 

4.624 

5.148 

4.836 

5.283 

0.260 

6 

4.703 

5.332 

4.958 

0.113 

0.999 

7 

4.759 

5.493 

5.056 

1.785 

0.775 

8 

4.785 

5.623 

5.124 

3.755 

0.440 


The LS estimates from the AR(2) model (with estimated standard errors in parentheses), 
as well as the ML estimate of £, are given as 



~ 0.504 

0.108 ' 


"-0.146 

-0.205' 

4>, = 

(0.096) 

0.345 

(0.056) 

0.531 

<t> 2 — 

(0.099) 

0.256 

(0.054) 

0.139 


(0.177) 

(0.103) 


(0.181) 

(0.099) 

1 = 

'5.0270 

1.6958 

1.6958 ' 

16.9444 





with |I)| = 82.3032. The estimates of the two constant terms are 1.217 and 1.527, with 
respective standard errors of 0.354 and 0.650. In the matrix <E> 2 , the coefficient estimate 
in the (1,2) position is statistically significant, while the rest are insignificant and might 
perhaps be omitted. 

We now examine the residuals a t from the fitted VAR(2) model. The residual 
autocorrelations and cross-correlations are displayed in Figure 14.5. The approximate 
two-standard-error limits are also included in the graphs. The individual elements of the 
residual correlation matrices are generally quite small for all lags through / = 12, with 
\p SJj (l)\ « 2/VN = 0.2 in nearly all cases. One notable feature of these residual correla¬ 
tions, however, is the (marginally) significant correlation of p s 22 (4) = —0.20 at lag 4 for 
the second residual series a 2t (see lower right panel of Figure 14.5). This feature, which 
also appears visible from the /;-values of the portmanteau test shown in Figure 14.6, may be 
a consequence of the seasonal adjustment procedure, related to a weak seasonal structure 
that may still exist in the quarterly (“seasonally adjusted”) series Z v To accommodate 
this feature, we could consider a modification to the VAR(2) model by inclusion of an MA 
coefficient matrix 0 4 at the quarterly seasonal lag of 4 in the model. Although this could 
lead to a small improvement, we do not pursue this modification here. 

As a benchmark for comparison against the bivariate AR(2) model fitted above, com¬ 
parable univariate models for z lt and z 2r that were found to be adequate, estimated by the 
conditional ML method, were obtained as 


(1 - 1.275 B + 0.545B 2 )z lt = 0.251 + (1 - 0.769 B)e u 
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FIGURE 14.5 Cross-correlation matrices for the residuals from the VAR(2) model fitted to the 
U.S. business investment and inventories data. 

with CTg = 5.44, and (1 - 0.690 B)z 2 , = 1.808 + e 2t , with a\ = 19.06. Note that the resid¬ 
ual variances are slightly larger in this case. The fitted bivariate models imply that the 
changes in business inventories series z 2t have a modest but significant influence on the 
(first differences of) investments z lr , but there appears to be less influence in the feedback 
from investments to the changes in inventories series. In addition, there is only a small de¬ 
gree of contemporaneous correlation suggested, since the correlation between the residual 
series a lt and a 2t in the bivariate models estimated from 2 equals 0.184. 

Remark. The bivariate analysis described above was performed using the multivariate time 
series package MTS in R. Letting ZZ denote the data after differencing the investments 
series, the relevant commands are 

> ccm(zz) % Cross-correlation analysis 

> ml=VARorder(zz) % Order selection 

> m2=VAR(zz,2) % Estimation of VAR(2) model 

> MTSdiag(m2) % Model checking 

> ccm(m2$residuals) % Residual cross-correlation analysis 

For more detailed discussion and for demonstrations of the analysis capabilities of the MTS 
package in R, see Tsay (2014). Multivariate time series tools are also available in other 
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Significance plot of CCM 



Lag 

FIGURE 14.6 Plot of p-values of the multivariate portmanteau statistic applied to the residuals 
from the fitted VAR(2) model. 

packages such as the SCA package released by Scientific Computing Associates Corp., 
and the S-Plus software package available from TIBCO Software, Inc. 

14.3 VECTOR MOVING AVERAGE MODELS 

The vector autoregressive models described above provide an adequate representation to 
many applied time series and are widely used in practice. However, pure autoregressive 
models have a disadvantage in that the model order needed to obtain a satisfactory rep¬ 
resentation can in some cases be rather high. Analogous to the univariate case, a more 
parsimonious representation can sometimes be achieved by adding moving average terms 
to the model. This would result in the vector ARMA (or VARMA) model form mentioned 
briefly in Section 14.1.4. Aggregation of vector series across time or in space also creates 
a need for VARMA models as noted e.g. by Liitkepohl and Poskitt (1996). In addition, 
trend or seasonal adjustments may change the dependence structure and make a pure VAR 
model inadequate (e.g., Maravall, 1993). Prior to discussing the VARMA model in more 
detail, we will briefly examine the special case when no autoregressive terms are present 
and the series follows a pure moving average model. 

14.3.1 Vector MA (q) Model 

A vector moving average model of order q, or VMA(g) model, is defined as 

9 

Z, = n + a t -Yj ®j a ,-j 

j =i 


(14.3.1) 
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or equivalently, Z r = p + Q(B)a t , where p is the mean of the process, 0(_B) = I — 0| B — 
• • • — 0 r/ B q is a matrix polynomial of order q, and the 0, are kxk matrices with 0, ; # 0 . 

Invertibility. A vector MA(q) process is said to be invertible if it can be represented in the 
form 

00 

(Z, - n) - 2 n - n) = a, (14.3.2) 

7=1 

or equivalently as H(B)(Z, — n) = a, where II( B) = I — Y.JL\ n ; 6 7 , with Y.J=\ ]I n, I < 
oo. The process is invertible if all the roots of det{0(£)} = 0 are greater than one in 
absolute value. The process then has the infinite VAR representation given by (14.3.2) with 
n(£) = 0 _1 (£) so that 0(B)H(B) = I. As in the univariate case, this form is particularly 
useful for determining how forecasts of future observations depend on current and past 
values of the k series. 

Moment Equations. For the VMA(g) model, the covariance matrices IT/ ) are given by 

q-l 

m=2 J & h ?Q' h+l (14.3.3) 

h=0 

for / = 0,1,... ,q, with 0 O = —I, and IT/) = 0, for / > q. The result is readily verified since 
the {a,} form a white noise sequence and Cov[0a f ] = 0E0*. 

14.3.2 Special Case: Vector MA(1) Model 

To examine the properties further, we consider the VMA(l) model, Z, — p = a t — &a l _ l . 
From the same reasoning as given concerning the stationarity condition for the VAR(l) 
process, the invertibility condition for the VMA(l) model is equivalent to all eigenvalues 
of 0 being less than one in absolute value. Then we have the convergent infinite VAR 
representation (14.3.2) with infinite VAR coefficient matrices ri ; = — 0 ; , j > 1. This 
follows since 0(£)II(£) = I now simplifies to Ily = 011^ = 0- / II o with II 0 = —I. Also, 
from (14.3.3) the covariance matrices of the VMA(l) process simplify to 

no) = £ + 0£0', ra^-no^ri-iy 

and T(/) = 0 for |/| > 1. Thus, as in the univariate MA(1) case, all covariances are zero for 
lags greater than one. 

14.3.3 Numerical Example 

Consider the bivariate (k = 2) VMA(l) model Z ( = (I — QB)a t with 



' 0.8 

0.7' 


'4 r 

0 = 

-0.4 

0.6 

and £ = 

.1 2 


Similar to results for the VAR(l) example, the roots of dct {Al — 0} = A 2 - 1.42 + 0.76 = 
0 are A = 0.7 ± 0.5196/, with absolute value equal to (0.76) 1 / 2 ; hence, the VMA(l) model 
is invertible. The coefficient matrices fl^ = — 0' in the infinite VAR form are of the same 
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magnitudes as the V F J coefficient matrices in the previous AR(1) example. The covariance 
matrices of the MA(1) at lags 0 and 1 are 


r(0) = s + ©s©' 


8.66 0.76 
0.76 2.88 


and 


r(l) = -20' = 


-3.9 

- 2.2 


1.0 

- 0.8 


with corresponding correlation matrices 


p( 0 ) = V _ 1 / 2 r( 0 )V _1/2 = 


1.000 0.152' 
0.152 1.000 


and 


MD = 


-0.450 

-0.441 


0 . 200 ' 

-0.278 


The above calculations are conveniently performed in R as follows: 


> library(MTS) 

> thetal=matrix(c(0.8,-0.4,0.7,0.6),2,2) 

> sig=matrix(c(4,1,1,2),2,2) 

> eigen(thetal) 

> Plwgt(Theta=thetal) 

> ml=VARMAcov(Theta=thetal, Sigma=sig, lag=l) 

> names(ml) 

[1] "autocov" "ccm" 

> autocov=t(ml$autocov) 

> autocorr=t(ml$ccm) 


For the bivariate MA( 1) model, it follows from the autocovariance structure that each series 
has a univariate MA(1) model representation as z, it = (1 — , a 2 = var(e 7 ). From 

£ i 

Appendix A4.3, the parameter values « and n 2 of the component series can be determined 

£ i 

directly by solving the relations p ;i (l) = —»/,-/(1 + rj 2 ), (0) = <7 2 (1 + q 2 ), i = 1,2,which 

lead to the values q l = 0.628, o 2 { = 6.211, and q 2 = 0.303, try, = 2.637, respectively. 


14.3.4 Model Building for Vector MA Models 

The model building tools discussed for VAR models in Section 14.2 extend in a 
straightforward way to moving average models. As noted, the estimated cross-covariance 
and cross-correlation matrices are particularly useful for specifying the model order q 
since from (14.3.3) the corresponding theoretical quantities are zero for lags greater than 
q. The partial autoregression matrices, on the other hand, would show a decaying pattern 
for a moving average process. The parameter estimates can be obtained using the least- 
squares method that is equivalent to conditional likelihood method under the normality 
assumption. However, analogous to the univariate case, the unknown presample values 
can have a larger impact on the parameter estimates for VMA models. In particular, if the 
true parameter values are close to the boundary of the invertibility region, the conditional 
likelihood approach can result in biased estimates, especially for relatively short series. 
Because of this, the use of the unconditional likelihood function is typically recommended 
for models with moving average terms. We return to the parameter estimation in Section 

14.4.5 where the exact likelihood function is discussed for the general VARMA case. 
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14.4 VECTOR AUTOREGRESSIVE-MOVING AVERAGE MODELS 

We now assume that the matrix 'F( B ) can be represented as the product (B) = 
*t>~ 1 (B)0(B), where d>(U) and 0(B) are the autoregressive and moving average matrix 
polynomials defined above. This leads to the vector model 

p q 

(Z, - ft) - £ < p.(Z,_j - ii) = a, - J 0j a t-j (14-4.1) 

7=1 7=1 

where a, again is a vector white noise process with mean vector 0 and covariance matrix 
2 = E[a t a' t ], The resulting process (Z ( } is referred to as a vector autoregressive-moving 
average, or VARMA(p, q), process regardless of whether (Z,) is stationary or not. 

As for the VAR( p) model, the VARMA( p, q) process can be expressed in structural 
form by premultiplying both sides of (14.4.1) by a lower triangular matrix d> ( 'j with ones on 

the diagonal such that = 2 # is a diagonal matrix with positive diagonal elements. 

This gives the following representation: 

d>#(Z r - ji) - £ -p) = b t -f j 0«b t _j (14.4.2) 

7=1 7=1 

where d>^ = dVd> ; , = Og0 y <l>* _1 , and b , = This model displays the concurrent 

dependence among the components of Z t through the lower triangular matrix (I)”, with di¬ 
agonal elements for S # , whereas the standard or reduced form (14.4.1) places the concurrent 
relationships in the covariance matrix 2 of the errors. More generally, premultiplication 
of (14.4.1) by an arbitrary nonsingular matrix dC yields a form similar to (14.4.2) that is 
useful in some cases. For example, representation of a VARMA model in this general form, 
but with a special structure imposed on the parameter matrices, will sometimes be more 
useful for model specification than the standard form (14.4.1). This is discussed further in 
Section 14.7. 


14.4.1 Stationarity and Invertibility Conditions 

The stationarity conditions for a VARMA(p, q) process are the same as for the VAR( p) 
process discussed in Section 14.2. Hence it can be shown that the process is stationary 
and has an infinite moving average representation Z t = /u + j a t-j if a ll the roots 

of det {<&(£)} = 0 are greater than one in absolute value. The coefficient matrices 'Ey are 
determined from the relation d>(£) v P(£) = 0(B), and satisfy the recursion 

] Vj=9{Vj_ 1 +9 2 V j _ 2 +-+%Vj_ p -0j j = 1,2,... (14.4.3) 

where V P 0 = I, 'Ey = 0 for j < 0, and 0 ; = 0 for j > q. 

Conversely, the VARMA(/;, q) process is invertible with an infinite AR representation 
similar to (14.3.2) if all the roots of det { 0(B)} = 0 are greater than one in absolute value. 
The coefficient weights n ; in the infinite AR representation are given by the relation 
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0( B)W( B) = <I>( B), and satisfy the recursion 

Ily = 0iII 7 _i + 02ll^_2 + y = 1,2,... (14.4.4) 

where II 0 = —I, Il 7 = 0 for j < 0, and <I» ; = 0 for j > p. 

In addition, using the moving average representation, the covariance matrices for Z t 
can be written as T(/) = ^ F rom this it follows that the covariance 

matrix-generating function is given by G(z) = ^“_ oc Yd)! 1 = 'P(z _1 )2*P(z) / ; hence, the 
spectral density matrix of the VARMA(p, q) process is given as in (A14.1.7) with 'F(z) = 
O> _1 (z)0(z). 


14.4.2 Covariance Matrix Properties of VARMA Models 

For the general stationary VARMA(p, q) process { Z t }, it follows from the infinite MA 
representation Z t = p + X/lo ^j a t-j that 


. r° 

E[Z, ,a. .] = l 

t-j J \ >p 2 


for j < l 
for j > l 


Therefore, it is easy to determine from (14.4.1) that the covariance matrices T(/) = 
E[(Z t _ t — ii)(Z t — p) 1 ] of { Z t } satisfy the relations 


P 9 

l = 0,l,...,q (14.4.5) 

1=1 1=1 

and r(/) = Yfj- 1 r(/ — j)<&j for 1 > q, with the convention that 0 O = —I . Thus, the F(7) 
can be evaluated in terms of the AR and MA parameter matrices <I> ; and 0 ; . and 2, using 
these recursions. 


14.4.3 Nonuniqueness and Parameter Identifiability for VARMA Models 

Although the VARMA(p, q) model appears to be a straightforward extension of the univari¬ 
ate ARMA(p, q) model, a number of issues are associated with this extension. For example, 
since each AR or MA term contributes k X k parameters, the total number of parameters 
in the model increases rapidly as the order increases. The overflow of parameters, whose 
estimates can be highly correlated, makes the interpretation of the modeled results very 
difficult. An additional problem that arises in the VARMA case relates to the nonunique¬ 
ness of the parameters and the lack of an identifiable model representation. This issue 
does not arise for the pure VAR(p) model or the pure VMA(g) model discussed earlier 
in this chapter. But in the vector case it is possible to have two ARMA representations, 
<I>( B)Z! = 0( B ja t and d>. (B)Z ( = 0 ; ( B)a ( with different parameters, that give rise to 
the same coefficients *P ; in the infinite MA representation, such that 

¥(£) = O _1 (B)0(B) = « 

Thus, the two models also give rise to the same covariance matrix structure {F(7)} and 
hence the same process. 

Two VARMA models with this property are said to be obsen’ationally equivalent , or 
the models are said to be exchangeable. As a basic example, the bivariate VARMA(1, 1) 



VECTOR AUTOREGRESSIVE-MOVING AVERAGE MODELS 529 


model (I — <I>,5)Z ( = (I — 0,5)a, with parameters 


O, 


0 a 
0 0 


0 » 


0 p 
0 0 


is observationally equivalent to both a VAR(l) model (I - <I>fi)Z, = a, and a VMA(l) 
model Z, = (I — 0B)a t , with 


since, for example, (I — d>,5) _1 (I — 0,5) = (I + d>,5)(I — d>,5) = (I — 05). Hence, 
the parameters O, and 0 , in the ARMA(1, 1) model representation are not identifiable, 
since the properties of the process depend only on the value of a — p. 

In general, observationally equivalent ARMA(p, q) representations can exist because 
matrix AR and MA operators could be related by a common left matrix factor lit B) as 

0,(5) = 11(5)0(5) and 0,(5) = U(5)0(5) 

but such that the orders of 0,(5) and 0,(5) are not increased over those of 0(5) and 
0(5). This common left factor U(5) would cancel when 0“’ (5)0,(5) is formed, resulting 
in the same parameter matrices in 'F( B). A particular ARMA model specification and its 
parameters are said to be identifiable if the 0 ; and the 0 ; are uniquely determined by the 
set of impulse response matrices *F ; in the infinite MA representation, or equivalently by 
the set of covariance matrices T(/j in the stationary case. 

For the mixed VARMA(p, q ) model, certain conditions are needed on the matrix opera¬ 
tors 0(5) and 0(5) to ensure uniqueness of the parameters in the ARMA representation. 
In addition to the stationarity and invertibility conditions, the following two conditions are 
sufficient for identifiability: 


1. The matrices 0(5) and 0(5) have no common left factors other than unimodular 
ones. That is, if 0(5) = U(5)Oj(5) and 0(5) = U(5)0](5), then the common 
factor U(5) must be unimodular, that is, det {U(5)} is a nonzero constant. When this 
property holds, 0(5) and 0(5) are called left-coprime. 

2. With q as small as possible and p as small as possible for that q, the joint matrix 
[0^, 0 ? ] must be of rank k, the dimension of Z r 

Notice that through the relation U(5) _I = [ 1/ det{U(5)}]adj{U(5)}, the operator U(5) 
is a unimodular matrix if and only if U( fiF 1 is a matrix polynomial of finite order. The 
operator U(5) = I — 0,5 in the simple ARMA(1, 1) example above is an illustration 
of a unimodular matrix. For further discussion of the identifiability conditions for the 
VARMA(p, q) model, see, for example, Hannan and Deistler (1988, Chapter 2) and Reinsel 
(1997, Chapter 2). 


14.4.4 Model Specification for VARMA Processes 

The model specification tools discussed for VAR(p) models in Section 14.2 extend in 
principle to the VARMA case. This includes the examination of the cross-correlation and 
partial autoregression matrices as discussed by Tiao and Box (1981). Additional tools 
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include the information criteria for model specification examined earlier, and the use 
of extended cross-correlation matrices for VARMA models discussed by Tiao and Tsay 
(1983). However, because of the identifiability issue and the overflow of parameters in the 
vector case, additional model specification tools focusing on the parameter structure of the 
VARMA representation are now needed. 


Kronecker Indices. Beyond the specification of overall orders p and q, the structure of the 
VARMAl/;, q) model can be characterized by a set of Kronecker indices K ] ,... ,K k and 
the McMillan degree M = K, of the process. The Kronecker indices, also known as 
structural indices, represent the maximal row degrees of the individual equations of the 
VARMA model. The use of these indices leads to the specification of a VARMA process of 
order p = q = max{ K,} with certain simplifying structure in the parameter matrices t&y and 
0y. A Kronecker index equal to K h in particular, implies that a VARMA representation 
can be constructed for the process such that the ;'th rows of the matrices <!>,• and 0 y - are 
zero for j > K t and with zero constraints imposed on certain other elements of Oy. The 
resulting model is referred to as the echelon canonical form of the VARMA model. The set 
of Kronecker indices is unique for a given VARMA process and the identifiability issue 
discussed above is thus avoided. The echelon form structure and identifiability conditions 
in terms of the echelon form have been examined extensively by Hannan and Deistler 
(1988) and others. 

The Kronecker indices can be estimated using canonical correlation analysis methods 
introduced by Akaike (1976) and further elaborated upon by Cooper and Wood (1982) and 
Tsay (1989a). These methods, which are extensions of the canonical correlation analysis 
procedures discussed for the univariate case in Section 6.2.4, are employed to determine 
the nonzero canonical correlations between the past and present values of the process, 
{Z t _j,j >0}, and the future values ( Z [+J . j >0}. In this way, the Kronecker indices 
Kj can be deduced, which then provide the overall model order as well as the maximum 
order of the AR and MA polynomials for each individual component. Further details of 
this approach will be given in Section 14.7. More extensive accounts of the Kronecker 
index approach to model specification have been provided by Solo (1986), Reinsel (1997), 
Liitkepohl (2006), and Tsay (1989b, 1991, 2014), among others. 


Scalar Component Models. Tiao and Tsay (1989) proposed an alternative way to identify 
the order structure of the VARMA model based on the concept of scalar component models 
(SCMs). This approach examines linear combinations of the observed series with the goal 
of arriving at a parsimonious model representation that overcomes the identification issue 
and that may reveal meaningful structures in the data. Using this approach, k independent 
linear combinations y jt = v. Z, of orders (/?,-, qf), i = \..... k. are sought such that the orders 
Pi + q t are as small as possible. Given a k-dimensional VARMA(p, q) process, a nonzero 
linear combination y, = v Z, follows SCM^,^) if 


P 1 9l 

yt - Z v '% z t-j = v ' a > - Yj v ' & j a t-j 

7=1 7=1 

where 0 < Pi < p. 0 < cp < q, and u t = y t — Yj P j-\ v ' i s uncorrelated with a t _j for 
j > q l . Notice that the scalar component y t depends only on lags 1 to /q of all variables 
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Z t , and lags 1 to of all the innovations a,. Starting from SCM(0, 0), the SCM method 
uses a sequence of canonical correlation tests to discover k such linear combinations. 

Once such a set has been found, the specification of the ARMA structure for Z, can be 
determined through the relations 

p <i 

TZ, - Yj GjZ,-j = Ta, + Y Hja t _j (14.4.6) 

j =1 J =1 

where T = [iq,..., v k ]' is a kx k nonsingular matrix, G ; = T<f>j,j = 1,... ,p, H ; 
/() / = 1 p = max { Pi } and q = max { q,}. Moreover, the ;'th row of Gj is specified 

to be zero for j > p, and the ith row of Hj is zero for j > q t . Premultiplication of (14.4.6) 
by T~ l thus leads to a VARMAi p, q) model for Z, in standard form but such that the coef¬ 
ficient matrices <1> • and 0 ; have a reduced-rank structure. On the other hand, inserting the 
factor T~ l T in front of the Z t _j and a,_j in (14.4.6) yields a VAR.VIAi p, q) representation 
for the transformed process Y, = T Z t as 

j =1 7=1 

where O* = G j T 1 = T<t>j T~\ 0* = Hj T 1 = T0 ; T~\ and e, = Ta t . This 
VARMA representation for the transformed process is parsimonious in the sense that 
the ;'th row of is zero for j > p t and the ith row of 0* is zero for j > q t . In addition, 
some elements of the ith row of 0*, for i = 1 ,..., q t , are specified to be zero to remove 
possible redundancy of the parameters in the AR and MA matrices. The method used to 
identify and eliminate redundant parameters is referred to as the rule of elimination. 

The approach of Tiao and Tsay (1989) thus identifies the scalar component processes 
Y t = TZ, and their associated orders ( p f , qf) through canonical correlation methods, and 
then estimates a VARMA process for the transformed variables Y, with zero constraints im¬ 
posed on some of the parameters. By comparison, the Kronecker index approach estimates 
Kronecker indices that lead to the echelon model form for the original series Z, directly. 
Also, the scalar component allows the orders of the AR and MA polynomials to differ while 
the orders are the same for the Kronecker index approach. The scalar component approach 
may in this regard be viewed as a refinement over the Kronecker index approach. 

More detailed comparisons of the Kronecker index and the SCM model specification 
methods are provided by Reinsel (1997) and Tsay (1989b, 1991, 2014). A comparison of 
the forecasting performance of models specified by the two approaches was reported by 
Athanasopoulos et al. (2012), who found the results for SCM more favorable. Software 
modeling tools are available for both methods in the MTS package in R; for details and 
demonstrations, see Tsay (2014). 

Order Determination Using Linear Least Squares. Before we proceed to discuss parame¬ 
ter estimation in the next section, we will mention another method that has been considered 
for VARMA model specification. This is a multivariate extension of the two-stage linear 
least-squares regression approach presented for the univariate case by Hannan and Rissa- 
nen (1982) and briefly discussed in Section 6.2.4. At the first stage of this procedure, the 
VARMA model is approximated by a high-order pure VAR model and the least squares 
method is used to obtain an estimate a, of the white noise error process a t . In the second 
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stage, one regresses Z f on the lagged Z t _. and lagged a t _. for various combinations of p 
and q. A model selection criterion such as BIC is then employed to help select appropriate 
orders for the VARMA model. Use of this procedure may lead to one or two models that 
seem highly promising, which are later estimated by more efficient procedures such as the 
maximum likelihood method. Similar linear estimation methods have been proposed by 
Hannan and Kavalieris (1984), Poskitt (1992), and Lutkepohl and Poskitt (1996), among 
others, for determining the Kronecker index structure of the VARMA model. 


14.4.5 Estimation and Model Checking for VARMA Models 

Once a well-defined VARMA model has been specified, the estimation of the parameters is 
typically performed using maximum likelihood methods assuming normality. In the past, 
conditional likelihood approaches were often employed for computational convenience. In 
the VARMA(p, q) model, this corresponds to treating the unknown presample values of Z t 
and a, as fixed constants with the a t , t = 0,..., 1 — q, typically set equal to zero. However, 
for many mixed models with an MA operator 0(B) having roots near the unit circle, the 
conditional likelihood approach has been shown to produce estimates with poorer finite 
sample properties than the unconditional, or exact, ML estimates. 

Various approaches to the construction of the exact Gaussian likelihood function have 
been considered in the literature. Earlier classical approaches to evaluate the exact like¬ 
lihood were presented by Hillmer and Tiao (1979) and Nicholls and Hall (1979). Given 
N observations Zj,... ,Z N , the exact likelihood of a stationary VARMAip, q) model 
<J>(B)Z, = Q(B)a t has the form 


L = ISr^lQr^lDr^exp 



a'ST 1 

t 


a, + K Q ~ 


(14.4.7) 


where a* = (Z' l p ,..., Z' 0 , a ' ,.... a' () )' denotes the vector of presample values, a* = 

E[&t,\Z ..., Z N ] represents the conditional expectation of a* given the data, Q = cov[aJ 
denotes the covariance matrix of a*, and D -1 = cov[a* — a*]. The a f satisfy the recursion 


j =1 7=1 


t = 1 ,...,N 


(14.4.8) 


where the presample values are the estimated values Z t . t = 1 — p .0, and a t .r = 1 — 

q ..... 0. Details of the calculations are given in the papers referenced above. Explicit 
expressions for the quantities Q, D, and a* are also provided by Reinsel (1997, Section 
5.3.1). 

Other approaches to likelihood evaluation emphasize the innovations form of the exact 
likelihood and the use of the state-space model representation of the VARMA model and 
the associated Kalman filtering methods; see, for example, Ansley and Kohn (1983), Solo 
(1984a), and Shea (1987). The innovations form of the exact likelihood is 

l =(n i^-ir i/2 ) ex p j- (|) z | 


(14.4.9) 
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where a t | f _ ( = Z t — Z t | ( _ x is the one-step prediction error, or innovation, 

= E[Z t \Z t _i,... ,Zi] 

denotes the linear predictor of Z, based on Z t _ x ,... ,Z X , and = cov[a r | r _[] is 

the one-step prediction error covariance matrix. The a t \ t _\ and X ( | f _|, for r = 1,..., N, 
can be computed recursively using the innovations algorithm described by Brockwell 
and Davis (1991) and Reinsel (1997). Equivalently, the quantities a t \ t _i = Z, — Z t \,_ i and 
are also obtained naturally as outputs from the Kalman filtering algorithm applied 
to the state-space representation of the VARMA model, which is discussed in more detail 
in Section 14.6. Asymptotic theory of the resulting maximum likelihood estimators for 
VARMA models has been studied by Dunsmuir and Hannan (1976), Deistler et al. (1978), 
and Hannan and Deistler (1988). 

Diagnostic Checking. The checking of the fitted model can be performed using the tools 
described for VAR models in Section 14.2.6. These include plots of the residuals against 
time and/or against other variables and detailed examination of the autocorrelation and 
cross-correlation functions of the residuals. These tools can provide valuable information 
about possible lack of fit and suggest directions for model improvement. Useful sup¬ 
plementary tools include the portmanteau test and similar statistical tests. These tools also 
extend to fitted models with constraints imposed on the parameter coefficient matrices (i.e., 
structured parameterizations), such as echelon canonical form and reduced-rank models 
discussed in more detail in Section 14.7. For example, the statistic Q s will then have k 2 s — b 
degrees of freedom in its limiting chi-squared distribution, where b denotes the number of 
unconstrained parameters involved in the estimation of the ARMA model coefficients 
and Qj. 

14.4.6 Relation of VARMA Models to Transfer Function and ARMAX Models 

The relationship between a bivariate VAR(l) model and a transfer function model was 
mentioned in Section 14.2.1. We will now briefly examine the relationship between 
subcomponents in a more general VARMA(p, q) process. We begin by partitioning the 
k-dimensional vector process Z t into two groups of subcomponents of dimensions k l and 
k 2 , respectively, as Z, = {Z' Z') . The innovations vector a t and the AR and MA 

matrix polynomials are partitioned accordingly as a, = (a' u , a'Y and 

e n (B) 0 12 CB)' 

©21 (B) © 22 (-®). 

Suppose now that d> 12 (-B) and @ 12 (-B) are both identically zero, and for convenience also 
assume that 0^(5) = 0. The equations for the VARMA model can then be expressed in 
two distinct groups as 


®(B) 


d>,|(B) <f> 12 0B) 
<& 2 i ( B ) d> 22 (-B) 


&(B) 


*&!! (B)Z\ t — 0jj(R)a] f 


(14.4.10a) 


and 


®22(B)Z 2t — ( I> 2 i(B)Z ll + 0 22 (B)a 2t 


(14.4.10b) 
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We see from these expressions that future values of the process Z lt are only influenced by 
its own past and not by the past of Z 2r , whereas future values of Z 2r are influenced by the 
past of both Z lr and Z 2( . Notice that even if 0 21 (_B) A 0, this conclusion still holds since 
the additional term in (14.4.10b) would then be 0 21 (U)a 1; = 0 2] (i?)0“ 1 1 (£) < I > ii(£)Z| ( . 

In the terminology of causality from econometrics, under (14.4.10a) and (14.4.10b), 
the variables Z ,, are said to cause Z 2r , but Z 2r do not cause Z lr . The variables Z lr are 
referred to as exogenous variables, and (14.4.10b) is often referred to as an ARMAX model 
or ARMAX system for the output variables Z 2r with Z Ir serving as input variables. The 
X in ARMAX stands for exogenous. The model (14.4.10b) can be rewritten as 

Z 2 , = 'P*(B)Z lt +'i' 22 (B) a 2( 


where 


^(£) = -0- 1 (iJ)^ 21 (£) and 'P 22 (B) = <t>^(B)Q 22 (B) 

This equation provides a representation for the output process Z 2t as a causal linear filter 
of the input process Z lr with added unobservable noise, that is, 

Z 2 t =^,(B)Z u + N, (14.4.11) 

where the noise process TV, follows a VARMA model <I> 02 ( B)N, = 0 22 (B)a 2f . Since the 
ARMAX model can be viewed as a special case of the VARMA model, the methods for 
model building are quite similar to those used for the VARMA model. These include the use 
of model selection criteria and least-squares estimation methods for model specification 
and examination of the residuals from the fitted model for model checking. For further 
discussion, see, for example, Hannan and Deistler (1988, Chapter 4) and Reinsel (1997, 
Chapter 8). 

In the special case of bivariate time series, Z 1( = z lt and Z 2t = z 2t are each univariate 
time series. Then we see from the above that when <t> 12 (B) = 0 and Q l2 (B) = 0, the model 
reduces to the structure of the ‘ ‘unidirectional’ ’ instantaneous transfer function model with 
z lr as the “input” process and z 2t as the output, assuming independence between z 2f and 
the noise term of z lt . More generally, assuming independence between Z 1( and TV, above, 
(14.4.11) can be viewed as a multivariate generalization of the univariate (single-equation) 
transfer function model discussed in Chapters 11 and 12. 


14.5 FORECASTING FOR VECTOR AUTOREGRESSIVE-MOVING 
AVERAGE PROCESSES 

14.5.1 Calculation of Forecasts from ARM A Difference Equation 

For forecasting in the VARMA(p, q ) model 

2, = X + 5 + a,-f j 0,a ( _, (14.5.1) 

7=1 7=1 

where 5 = (I — — ••• — < 1 )^ ) fi for stationary processes, we assume that the white noise 

series a, are mutually independent random vectors. From general principles of prediction, 
the predictor of a future value Z, +/ , / = 1,2,..., based on observations available at time 
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t, {Z s ,s < f}, that yields the minimum mean squared error (MSE) matrix is given by 
Z,(l) = E[Z t+ i | Z t , Z t _ j,...]. So from a computational view, forecasts are determined by 
applying conditional expectations to both sides of the VARMAl/;, q) relation 

ft>(B)Z l+/ = 8 + 0(B)a t+ i 

using the result that E[a l+h \Z r Z t _ x , ...] = 0, h > 0, since a l+h is independent of present 
and past values of the series. Thus, forecasts Z t (l) can be computed recursively from the 
VARMA model difference equation as 

p q 

Z t (l)=J j ^ j Z t (l-j) + 8-J j & j a t+ i_ J l = 1,2,... ,q (14.5.2) 

j =1 J=l 

with Zj(l) = Yj P j ~ i ~ J) + *5’ f° r / > 9, where Z,(l — j) = Z t+l _j for / < j. Note 

that for pure VAR models with q = 0 

p 

Z t (l) = ^<t>jZ,(l -j) +8, for all / = 1,2,... 
i= i 

So the p initial forecast values are completely determined by the last p observations 
Z t , Z t _\,..., Z t _ p+l ; hence, for AR models all forecasts depend only on these last p 
observations in the series. 

For models that involve an MA term, in practice it is necessary to generate the white 
noise sequence a t recursively from the past data Z ,, Z 2 ,... ,Z t , as 

p q 

a s = Z s — ^ <bjZ s _j — 8 + ^ Qjd s _j s = 1,2 ,... ,t 
7=1 7=1 

using appropriate starting values for a 0 , ...,a^_ q and Z Q ,..., Z { _ p . One way to es¬ 
timate the starting values is to use the backcasting technique described earlier for 
evaluation of the exact likelihood function for ARMA models. This method yields 
a x _ ) = E[a 1 _ J \Z t ,...,Z 1 ], j = 1, ...,q, and Z w = E[Z x _j\Z t , ...,Z x ],j = 1, ...,p. 
The resulting forecasts Z t (l ) are then equal to 

Z t {l) = E[Z t+l \Z t ,...,Z x ] 

These are optimal forecasts based on the finite past history Z t , Z t _ x ,... ,Z X , although the 
analysis of forecast properties given below assumes that the forecasts are based on the 
infinite past history Z s , all s < t. However, these two forecasts will be nearly identical 
for any moderate or large value of t, the number of past values available for forecasting. 
Alternative methods to obtain the “exact” finite sample forecasts, as well as the exact 
covariance matrices of the forecast errors, based on the finite sample data Z x ,..., Z t , in 
a convenient computational manner are through an innovations approach or through the 
closely related state-space model and Kalman filter approach that will be discussed briefly 
in Section 14.6. 
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14.5.2 Forecasts from Infinite YMA Form and Properties of Forecast Errors 

To establish the theoretical MSE properties of the forecast errors, we use the “infinite” 
moving average representation Z f = *P (B)a, of the VAR.VIAl p, q) model, where *P( B ) = 
<I> 1 (B)0( B) = YfjL o BE A future value Z (+/ , relative to the forecast origin t, can then 
be expressed as 

00 

Z t+l = X ^J a t+l-j = a i+l + + - + ^/-l a r+l + ^l»t + 

y=o 

Thus, since E[a t+h \ Z t , Z f _j,... ] = 0, h > 0, the minimum MSE matrix predictor of Z t+l 
based on Z ( , Z r _ 1? ... can be represented as 

00 

Z,(/) = £[Z, +/ |Z,,Z, = Yj 'VjVt+l-j (14.5.3) 

j=l 


The /-step-ahead forecast error is e t (l) = Z f+/ — Z t (l) = Y! j={)^j a t+i-j ^ as zero mean 
and covariance matrix: 

/-1 

W) = CO v[e t (l)] = E[e t (J)e t (J)'] = J T 0 = I (14.5.4) 

j =o 

In particular, for one step ahead, e,(l) = Z (+] — Z,(l) = a l+[ with error covariance matrix 
X, so that the white noise series a t can be interpreted as a sequence of one-step-ahead 
forecast errors for the process. 

It follows from the infinite MA representation of the forecasts given by (14.5.3) that we 
obtain the multivariate version of the updating formula (5.2.5) as 

00 

Z t+l (l) = E[Z t+l+l \Z t+l , Z t ,...]=J j ^ j a t+l+l _ j = Z t (l + l) + ^ l a t+l (14.5.5) 

j=i 

where a l+l = Z l+] — Z,( 1) is the one-step-ahead forecast error. This provides a simple 
relationship to indicate how the forecast Z ( (/) with forecast origin t is adjusted or updated 
to incorporate the information available from a new observation Z l+l at time t + 1. 

For the case of unit-root nonstationary processes to be discussed in Section 14.8, similar 
forecasting topics as presented above can also be developed and results such as (14.5.2) 
and (14.5.4) continue to apply. 


14.6 STATE-SPACE FORM OF THE YARMA MODEL 

The state-space model was introduced for univariate ARM A models in Section 5.5. Similar 
to the univariate case, the VARMA model can be represented in the equivalent state-space 
form, which is of interest for purposes of prediction as well as for model specification 
and maximum likelihood estimation of parameters. The state-space model consists of a 
transition or state equation 


Y,=<&Y,- 1 + f t 
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and an observation equation 

Z, = H Y, + N, 

where Y t is an r X 1 (unobservable) time series vector called the state vector, and e t 
and N, are independent white noise processes. In this representation, the state vector Y t 
conceptually contains all information from the past of the process Z v which is relevant 
for the future of the process, and, hence, the dynamics of the system can be represented 
in the simple first-order or Markovian transition equation for the state vector. The above 
state-space model is said to be stable if all the eigenvalues of the matrix d» are less than one 
in absolute value, and conversely, it can be shown that any stationary process Z ; that has 
a stable state-space representation of the above form can also be represented in the form 
of a stationary VARMA(p, q) model; see, for example, Akaike (1974b). Hence, it follows 
that any process Z ( that satisfies a stable state-space representation can be expressed in 
the causal convergent infinite moving average form Z t = 'F( B )a r The stability condition 
for the matrix <I> in the state-space model is equivalent to the stability condition for the 
matrix coefficients *F ; of the linear filter 'K( B ) (see Appendix A14.1.2), since it ensures 
that l7=o 11*, || < oo in the representation Z t = 'F( B)a t . 

For the VARMA(p, q) model (14.5.1) (with 5 = 0), define the predictors Z,(j) = 
E[Z t+ j\Z t , Z t _i ,...] as in Section 14.5.1 for j = 0, 1,... ,r — 1, with /• = max(p, q + 1), 
and Z,(0) = Z t . From the updating equations (14.5.5), we have Z t (j — 1) = Z t _^(j) + 
i -\a t ,j = 1,2, — 1. Also, for j = r > q we find using (14.5.2) that 

p 

Z,(j - 1) = Z t _ x (J) + *P j-\ a t = Yj - o + 'Vj-ia, 

i= 1 

Let us define the “state” vector at time f, with r vector components, as Y, = 
|Z f (0)', Z,(l)',.... Z,(r - 1)']'. Then, from the relations above, the state vector Y t satis¬ 
fies the state-space (transition) equations 

0 I 0 ••• 0 1 [ I ' 

0 0 I • • 0 

y,= : y,_ 1+ ; a, (i4.6.i) 

o o -i t ,._ 2 

where <I> ( = 0 if i > p. Thus, we have 

Y, = + *Pa, (14.6.2) 

together with the observation equation 

Z* = Z, + N, = [1,0,... ,0]Y, + N, = HY r + N, (14.6.3) 

where the vector noise N t would be present only if the process Z, is observed subject to 
additional white noise error; otherwise, we simply have Z, = Z* = HY ; . For convenience, 
we assume in the remainder of this section that the additional white noise is not present. 

The state or transition equation (14.6.2) and the observation equation (14.6.3) constitute 
a state-space representation of the VARMA model. There are many other constructions 
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of the state vector Y, that will give rise to state-space equations of the general form 
(14.6.2) and (14.6.3); that is, the state-space form of the VARMA model is not unique. 
Specifically, if we transform the state vector Y t into Y t = PY,, where P is an arbitrary 
nonsingular matrix, then models (14.6.2) and (14.6.3) can be written in a similar form in 

terms of Y t with <f> = P<I>P 1 , II = HP -1 , and T = P'F. The particular form given above 
has the state vector Y t , which can be viewed as generating the space of predictions of all 
future values of the process Z t , since Z r (l ) = Y.' l= \ < I ) ,2',(/ - /) for / > r — 1. 

In the state-space model, the unobservable state vector Y, constitutes a summary of the 
state of the dynamic system through time t, and the state equation (14.6.2) describes the evo¬ 
lution of the dynamic system in time. The minimal dimension of the state vector Y t in a state- 
space representation needs to be sufficiently large so that the dynamics of the system can 
be represented by the simple Markovian first-order structure. State-space representations 
for the VARMA model can exist with a state vector of minimal dimension smaller than 
the dimension in (14.6.1). This minimal dimension is the dimension of the set of basis 
predictors that generate the linear space of predictors of all future values; it is of smaller 
dimension than in (14.6.1) whenever the state vector Y, can be represented linearly in 
terms of a smaller number of basis elements. Specifically, suppose that Y, in (14.6.1) can 
be expressed as Y t = AY*, where Y* is an M X 1 vector whose elements form a subset of 
the elements of Y t , with M < rk being the smallest possible such dimension. Then A is a 
rk X M matrix of full rank M, with Y* = (A'A) -1 A'Y,. and we assume the first kx M 
block row of A is [1,0,.... 0]. Thus, multiplying (14.6.2) on the left by (A'A) ; A', we ob¬ 
tain the equivalent representation of minimal dimension M given by Y* = < &*Y*_ 1 + y Va t , 
where ®* = (A'A)" 1 A'®A and T* = (A'Ar'A'T, with Z, = HAY* = HY*. This min¬ 
imal dimension M is in fact the McMillan degree of the process {Z ; ) as described in 
Section 14.7.1 below. 

One important use of the state-space form of the VARMA model is that it enables exact 
finite sample forecasts of the process { Z ,} to be obtained through Kalman filtering and 
the associated prediction algorithm. This provides a convenient computational procedure 
to obtain the minimum MSE matrix estimate of the state vector Y r+/ based on observations 
Zj,..., Z t as Y t+l | f = E[Y t+ i\Z 1 ,..., ZJ, with 

P l+l \, = E[(Y t+ ,-Y t+llt )(Y t+l -Y t+llt y] 

equal to the error covariance matrix. The recursions for the Kalman filter procedure have 
been presented as equations (5.5.6) to (5.5.9) in Section 5.5.2. It follows that optimal 
forecasts Z t+l ^ = £[Z (+/ |Z|,.... ZJ of future observations Z, +/ are then available as 
Z t+ i | ; = HY t+/ | r , since Z (+/ = II Y (+/ , with forecast error covariance matrix 

s /+/|r = E[{Z t+l - Z (+/ |,)(Z r+/ - Z t+Ilt )’] = HP r+/ | f H / 

The “steady-state” values of the Kalman filtering lead / forecast error covariance ma¬ 
trices, obtained as t increases, equal the expressions in (14.5.4) of Section 14.5.2, 
£(/) = 2/=o That is, S r+ /|, approaches £(/) as t -» oo. 

Thus, the Kalman filtering procedure provides a convenient method to obtain exact fi¬ 
nite sample forecasts for future values in the VARMA process, based on observations 
Zj,...,Z ( , subject to specification of appropriate initial conditions to use in (5.5.6) 
to (5.5.9). In particular, for the VARMA process represented in state-space form, the 
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exact finite-sample one-step-ahead forecasts Z t ^ t _ x = 'SY t ^ t _ x , and their error covariance 
matrices = HP^-iH', can be obtained conveniently through the Kalman filtering 

equations. This can be particularly useful for evaluation of the exact Gaussian likelihood 
function, based on N vector observations Z x ,..., Z N from the VARMA process, as 
mentioned earlier in Section 14.4.5. 


14.7 FURTHER DISCUSSION OF VARMA MODEL SPECIFICATION 

In this section, we return to the issue of model specification for a vector ARMA process. 
As noted in Section 14.4.4, extending the ARMA model to the vector case involves some 
difficulties that are not present in the univariate case. One problem in the vector case is 
the overflow of parameters, whose estimates can be highly correlated. A second issue is 
that of identifiability, which refers to the fact that two different sets of parameters can give 
rise to the same probability structure and hence the same process. This causes problems at 
the parameter estimation stage, in particular, since the likelihood function will not have a 
uniquely defined maximum in this case. Two methods designed to overcome these issues 
are the Kronecker index approach that originates in the engineering literature and the 
SCM method developed by Tiao and Tsay (1989). Both methods make use of canonical 
correlation analysis methods to arrive at a parsimonious and well-defined VARMA model. 

In this section, we will discuss the VARMA model specification in more detail focusing 
on the Kronecker index approach to model specification. We first discuss the estimation of 
the Kronecker indices and the McMillan degree of a vector process. We then describe the 
specification of the echelon canonical form of the VARMA model through the Kronecker 
indices. A brief discussion of the use of partial canonical correlation analysis to identify 
models with reduced rank structure is also included. 


14.7.1 Kronecker Structure for VARMA Models 

The VARMA(p, q ) model (14.4.1) can always be expressed in the equivalent form 

d>#(Z r -/U-X 0 # (Z,_; -n) = 0*a, - 2 0*a t _j (14.7.1) 

7=1 7=1 

where is an arbitrary nonsingular matrix, d»*' = . 0* = d>*, and 0* = d>“0 / . For 

purposes of parsimony, we are interested in model forms that lead to the simplest structure 
in some sense, such as in terms of the number of unknown parameters in the matrices 

JJ. JJ. JJ. JJ JJ 

d>", d>j,.... dV, 0' ,..., 0". For unique identifiability of the parameters, it is necessary to 

normalize the form of d>Jj at least to be lower triangular with ones on the diagonal. 

As discussed in detail by Hannan and Deistler (1988, Chapter 2), a representation of 
a VARMA model in a certain special form of (14.7.1) can sometimes be more useful for 
model specification than the standard or reduced VARMA form (14.4.1), and this form of 
(14.7.1) is referred to as the echelon canonical form of the VARMA model. To specify 
the echelon canonical form, k Kronecker indices or structural indices, K x ...., K k , must 
be determined beyond the overall orders p and q. The echelon (canonical) form is such 
that [<&*(£), 0 # (£)] has the smallest possible row degrees, and K : denotes the degree of 
the ;th row of [d> ?; ( B). 0 "( 6)], that is, the maximum of the degrees of the polynomials 
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in the z'th row of [<& # (B), 0 # (B)], for / = 1,..., k, and with p = q = maxfKj,..., K k }. 
The specification of these Kronecker indices or “row orders” {K ( }, which are unique for 
any given equivalence class of ARMA models, that is, models with the same infinite MA 
operator V P( J B), then determines a unique echelon canonical form of the VARMA model 
(14.7.1) in which the unknown parameters are uniquely identifiable. 

Kronecker Indices and McMillan Degree of VARMA Process. For any stationary vec¬ 
tor process {Z,} with covariance matrices T(/) = cov[Z ( , Z r+/ ], we define the infinite¬ 
dimensional (block) Hankel matrix of the covariances as 

T(iy T{ 2 )' r(3/ - ■ 
r(2)' roy iw - 

H = T(3)' IW r(5)' - (14.7.2) 


Then, in particular, the McMillan degree M of the process is defined as the rank of the 
Hankel matrix H. The process { Z t } follows a finite-order VARMA model if and only if the 
rank of H is finite. For a stationary VARMA(p, q) process, the moment relations (14.4.5) 
yield that 

p 

r(/)' - ^ O y T(/ - j)' = 0 for/ >q (14.7.3) 

7=1 

It can be seen directly from this that the rank of H, the McMillan degree M, will then satisfy 
M < ks, where s = max { p, q], since all the k X k block rows of H beyond the .vtli block 
row will be linearly dependent on the preceding block rows. But the McMillan degree M 
of a VARMA(p, q) could be considerably smaller than ks due to rank deficiencies in the 
AR and MA coefficient matrices. 

The McMillan degree M has the interpretation as the number of linearly independent 
linear combinations of the present and past vectors Z t , Z t _ lt ... that are needed for optimal 
prediction of all future vectors within the ARMA structure. Note that 

H = cov[F r+1 , P,] = cov[F r+1|r , P,] (14.7.4) 

is the covariance between the collection of all present and past vectors, P, = (Z' f , Z' tV .., 
and the collection of all future vectors F r+1 = (Z' +| . Z' +2 ,...)' or the collection of predicted 
values of all future vectors, F f+1 | r = C[F r+l |P f ], Hence, if the rank of H is equal to M, 
then the (linear) predictor space formed from the collection F r+1 |, of predicted values 
Z,(l) = E[Z t+l |P,], I > 0, of all future vectors is of finite dimension M. Sometimes (e.g., 
Hannan and Deistler, 1988, Chapter 2) the Hankel matrix H is defined in terms of the 
coefficients V F V in the infinite MA form Z t — p = Y.%o v F ; a ;j °* the ARMA process, 
instead of the covariance matrices V(j)', but all main conclusions hold in either case. 

In addition, the / th Kronecker index K r i = 1,..., k, of the process {Z ; } is the smallest 
value such that the (kKj + /)th row of H, that is, the /th row in the (K l + l)th block of 
rows of H, is linearly dependent on the previous rows of H. This also implies, through 
the structure of the Hankel matrix H, that all rows kl + i, for every I > K r will also be 
linearly dependent on the rows preceding the (kK t + /)th row. The set of Kronecker in¬ 
dices {Kj,..., K k ] is unique for any given VARMA process; hence, it is not dependent 
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on any one particular form of the observationally equivalent ARMA model representa¬ 
tions of the process. As indicated in Section 14.6, the VARMA model can be repre¬ 
sented in its equivalent minimal dimension state-space form, with minimal dimension, the 
McMillan degree 


k 

M=Y J K i =K x +K 2 + - + K k 

i= 1 

being the number of linearly independent predictors required to generate the linear pre¬ 
diction space {Z,(l). l > 1} of all future vectors {Z l+l ,l >1}. This minimal dimension 
state-space representation is one way to reveal the special structure of the VARMA pa¬ 
rameters associated with the Kronecker indices. Canonical correlation analysis methods 
between past and future vectors of a VARMA process {Z,} are useful as a means to 
determine the Kronecker indices of the process. We will now indicate, in particular, the 
direct connections that the Kronecker indices have with the second moment equations as in 
(14.4.5) and (14.7.3), since these equations exhibit the row dependencies among the covari¬ 
ance matrices V(j)' ■ Hence, knowledge of these Kronecker indices can be used to deduce 
special structure among the AR and MA parameter matrices and lead to specification of 
the special (echelon) form of the VARMA model. 


Echelon Canonical Form Implied by Kronecker Indices. Specifically, if V ARMA models 
similar to the form in (14.7.1) are considered, with O* = 0* lower triangular (and having 
ones on the diagonal), then equations similar to (14.4.5) for the cross-covariance matrices 
T(/) of the process are obtained as 


p g 

0*1X0' - X o*r (l-j)' = - £ 0W_, (14.7.5) 

i =i j=i 

Thus, if <Pj(0' denotes the ;th row of <&*, then the /th Kronecker index equal to K j implies 
the linear dependence in the rows of the Hankel matrix H of the form 

k, 

0 o (O'r(O'- ^^(0'r(/-y)' =0' for all / > K, + 1 (14.7.6) 

i= i 

that is, ft' H = 0' with ft' = (— 4> K {i )',.... —<fi x (/)', </>oO)'- O', ...). Note that by definition of 
the ;th Kronecker index K t . the row vector </> 0 0)' in (14.7.6) can be taken to have a one in 
the ;th position and zeros for positions greater than the /th. Therefore, a Kronecker index 
equal to K, implies, in particular, that an ARMA model representation of the form (14.7.1) 
can be constructed for the process such that the zth rows of the matrices <&* and 0* will be 
zero for j > . 

In addition to these implications from (14.7.6), additional zero constraints on certain 
elements in the / th rows of the matrices <&* for j < K t can be specified. Specifically, the 
/th element of the /th row 4>jOY can be specified to be zero whenever j + K t < Kj because 
for K t < K t the rows k(K t + j) + /,_/ = 0, ..., ( K : — K t ), of the Hankel matrix H are all 
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linearly dependent on the previous rows of H. Hence, the (/, /)th element of the AR operator 

= 0 *- £ &«]}■> 
j= i 

in model (14.7.1) can be specified to have nonzero coefficients only for the lags j = 
Kj — K[i + 1,..., Kj, with zero coefficients specified for any lower lags of j (when i / /), 
where we define 

C min( K t + 1, K,) for i > 1 

K„ = { (14.7.7) 

^min(Kj, Kf) for i < I 

(so that whenever K t < K t we have K tl = Kj). Thus, the corresponding number of unknown 
AR parameters in the (/,/)th element of <!>"( B) is equal to K n . Hence, the AR operator 
<1> # (£) in model (14.7.1) can be specified such that the total number of unknown parameters 
of <& # (_B) is equal to X/=i 2/=i ^il = Af + 2 K,/, while the number of unknown 

parameters in the MA operator 0 # (U), excluding those parameters in 0^ = <&*, is equal to 
If = i kK, = kM. 

In summary, for a stationary linear process { Z t } with Kronecker indices K x ,..., K k , a 
VARMA representation as in (14.7.1) with p = q = {max Kj} can be specified to describe 
the process, with the matrices <I»| and 0j possessing the structure that their /th rows are 
zero for j > and the additional zero constraints structure noted above. Moreover, for a 
stationary vector process with given covariance matrix structure T(/), or equivalently with 
given infinite MA coefficients ^, Hannan and Deistler (1988, Theorem 2.5.1) have shown 
that this model provides a unique VARMA representation, with AR and MA operators 
<& # (-B) and 0 "( B) being left-coprime, and where all unknown parameters are identified. 
This (canonical) ARMA representation is referred to as a (reversed) echelon ARM A form. 
In particular, the VAR coefficient matrices <f>|' in the echelon canonical representation 
(14.7.1) are uniquely determined from the T(/) by the requirement that their /th rows 
4>j(i)', j = 0,..., Kj. i = 1,..., k, satisfy the conditions (14.7.6). 


Examples. For simple illustrative examples, consider a bivariate (k = 2) process (Z,}. 
When this process has Kronecker indices K\ = K 2 = 1, then a general VARMA(1, 1) rep¬ 
resentation Zt-<b x Zj- X = a, — 0i<i r _ .[ is implied. However, notice that a pure VAR(l) 
process with full-rank VAR matrix <t> x and a pure VMA(l) process with full-rank VMA 
matrix 0, would both also possess Kronecker indices equal to K, = K 2 = 1. This simple 
example thus illustrates that specification of the Kronecker indices alone does not necessar¬ 
ily lead to the specification of a VARMA representation where all the simplifying structure 
in the parameters is directly revealed. For a second case, suppose the bivariate process 
has Kronecker indices K i = 1 and K-, = 0. Then, the implied structure for the process is 
VARMA(1, 1) as in (14.7.1), with (note, in particular, that K n = 0 in (14.7.7)) 


<I> 


# 

o 


1 O' 
W 1 


<I> 


X 0 
0 0 


0? 


X X 
0 0 


where the X ’s denote unknown parameters that need estimation and 0’s indicate values that 
are known to be specified as zero. On multiplication of the VARMA(1, 1) relation ^qZ, — 
dT Z t | = 0qC! ( - 0| a,_| on the left by <I> ( ^ 1 , we obtain a VARMA(1, 1) representation 
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Z t — = a, — 0]a,_i in the standard VARMA form (14.4.1), but with a reduced- 

rank structure for the coefficient matrices such that rank [<&], 0i ] = 1. For a third situation, 
suppose the bivariate process has Kronecker indices = 2 and K 2 = 1. Then, the echelon 
form structure for the process is VARMA(2, 2) as in (14.7.1), with (note X 12 = 1 in 
this case) 


[1 oi 


II 


\ X °1 




rx x] 


o o 


<i>" 


of 


a>? 


0| = 


x x 

X X 



X X 
0 0 


Again, on multiplication of the echelon form VARMA(2, 2) relation on the left by O* -1 , 
we obtain a VARMA(2, 2) representation in standard form, but with reduced-rank structure 
for the coefficient matrices such that rank [<& 2 , 0 2 ] = 1. 


Software Implementation. In practical applications, the Kronecker index approach to 
model specification can be implemented using the commands Kronid, Kronfit, and re- 
f Kronf it available in the MTS package of R. The specification of the Kronecker indices is 
performed using the command Kronid and is based on canonical correlation analysis. With 
the Kronecker indices specified, the VARMA parameters are estimated using the command 
Kronfit. Parameters with nonsignificant estimates can be removed using the command re- 
fKronfit. For further discussion and for demonstrations of the individual commands, see 
Tsay (2014). 


14.7.2 An Empirical Example 

To illustrate model specification approach described above, we return to the bivariate 
time series of U.S. fixed investment and change in business inventories analyzed earlier 
in this chapter. A bivariate VAR(2) model was fitted to the series in Section 14.2.7. As 
an alternative, we now consider the possibility of a mixed VARMA model for these data 
through determination of the echelon canonical ARMA model for the two series. The 
Kronecker indices {Kf for the process are determined using the canonical correlation 
method suggested by Akaike (1976) and Cooper and Wood (1982); see also Tsay (2014, 
Section 4.4). For the vector of present and past values, we use a maximum of three time- 
lagged vector variables and set P, = (Z' r Z'_ p Z' t _f)'. Then, for various vectors F* +] 
of future variables, the squared sample canonical correlations between F* +J and P, are 
determined as the eigenvalues of the matrix similar to the matrix in (6.2.6) of Section 
6.2.4. The canonical correlation analysis calculations are performed sequentially by adding 
variables to F* one at a time, starting with F* +1 = (zj r+1 ), until k = 2 near zero sample 
canonical correlations between P, and F* are determined. At each step, a likelihood ratio 
test is used to determine the significance of the smallest squared canonical correlation. 

The calculations can be performed using the MTS package in R. If zz denotes the two 
time series, the command for determining the Kronecker indices is Kronfit(zz, plag=3), 
where plag represents the number of elements in P ( . The resulting squared sample canon¬ 
ical correlations between P, and various future vectors F* +J are presented in Table 14.3. 
From these results, we note that the first occurrence of a small squared sample canonical 
correlation value (0.044), indicative of a zero canonical correlation between the future and 
the present and past, is obtained when F* +J = (zj (+1 , z 21+1 , z 1 (+2 )'. This indicates that the 
Kronecker index K\ = 1, since it implies that a linear combination involving z | f+2 in terms 
of the remaining variables in F* that is, of the form z l r+2 - f/>, (1 )'Z r+l , is uncorrelated 
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TABLE 14.3 Specification of Kronecker Indices for First Differences of U.S. Fixed Investment 
Data and Changes in Business Inventories Data 


Future Vector F* +1 

Smallest Squared 
Canonical Correlation 

LR 

Test 

Degrees 
of Freedom 

p -Value 

Kronecker 

Index 

Z U+1 

0.371 

44.02 

6 

0.000 


z l , r + l ’ z 2 , r+l 

0.369 

43.50 

5 

0.000 


z l , r + l ’ z 2 ,(+ l ’ z l , l+2 

0.044 

4.13 

4 

0.389 

K t = l 

Z l , r + 1 » Z 2 , M-1 ’ Z 2 , r+2 

0.069 

6.20 

4 

0.185 

K 2 = 1 


with the present and past vector P f . An additional small squared canonical correlation 
value of 0.069 occurs when F* = (zj (+1 , z 2t+ \, z 2t+2 )'> and this implies that we may 
have K 2 = 1. Hence, this leads to specification of a VARMA(1, 1) model in the echelon 
form of equation (14.7.1) with Kronecker indices K l = K 2 = \. This echelon model form 
is, in fact, the same as the standard VARMA(1, 1) model in (14.4.1); that is, K l = K 2 = 1 
implies that we have <1>^ = 0* = I in (14.7.1). 

The canonical correlation analysis suggests that a VARMA(1, 1) model might be essen¬ 
tially equivalent to the VAR(2) model in terms of fit, and that these two models are likely 
superior to other models considered. The parameters of the VARMA(1, 1) model were 
estimated using the Kronfit routine available in the MTS package of R, and the results are 
given as 


' 0.440 

-0.200 ' 


' -0.030 

-0.309" 

(0.176) 

(0.063) 


(0.209) 

(0.081) 

0.637 

0.775 

u, - 

0.313 

0.227 

(0.210) 

(0.076) 


(0.284) 

(0.129) 


± = 


'5.0239 

1.6697 


1.6697 ' 
16.8671 


with |£| = 81.9498, and A1C = 4.608. Again, the coefficient estimate in the (1, 1) position 
of the matrix 0j, as well as estimates in the second row of 0], is not significant and might 
be omitted from the model. 

It is clear from these estimation results, particularly from the estimates £ and associated 
summary measures, that the VARMA(1, 1) model provides a nearly equivalent fit to the 
VAR(2) model. For instance, we consider the coefficient matrices 4 , / in the infinite VMA 
representation for Z t implied by the VAR(2) and VARMA(1, 1) models. For the VAR(2) 
model, the are determined from *P| = <I>,: 

= t&jvp ,_j + t& 2 vp ,_ 2 f 0r j > 1 (T 0 = I) 

hence, the T ( are given as 



"0.50 0.11 ' 



"0.15 

-0.09" 


" -0.00 

-0.12' 

*t = 

0.34 0.53 


*2 = 

0.61 

0.46 

*3 = 

0.55 

0.31 


-0.09 -0.11" 


'-0.11 

-0.08' 


-0.10 

-0.05' 

*4 = 

0.41 0.16 

*5 = 

0.26 

0.06 

*6 = 

0.14 

-0.00 
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and so on, while those for the VARMA(1, 1) model are determined from V P 1 =<J> 1 — 
> 1, and so are given as 



'0.47 0.11' 



'0.14 

-0.06' 



' -0.05 

-0.13' 

¥l = 

0.32 0.55 


*2 = 

0.55 

0.49 


*3 = 

0.52 

0.34 


'-0.12 -0.12' 


'-0.13 

-0.09' 


'-0.10 

-0.05' 

^4 = 

0.37 0.19 

*5 = 

0.21 

0.07 

^6 = 

0.08 

-0.01 


Thus, we see that the 4' / coefficient matrices are very similar for both models, implying, 
in particular, that forecasts Z,(/) and the covariance matrices E(/) + of the 

/-step-ahead forecast errors e t (l) = Z t+l — Z t (l ) obtained from the two models, VAR(2) 
and VARMA(1, 1), are nearly identical. 

14.7.3 Partial Canonical Correlation Analysis for Reduced-Rank Structure 

Another approach to allow for simplifying structure in the parameterization of the VAR 
and VARMA models is to incorporate certain reduced-rank structure in the coefficient 
matrices. For the VAR(p) model (14.2.1), Ahn and Reinsel (1988) proposed a particular 
nested reduced-rank model structure, such that 

rank(Op = r } > rank (0u +1 ) = r j+l j = 1,2,..., p - 1 

and it is also specified that range(d>y) D range(d> ; . +1 ). Then the du can be represented in 
reduced-rank factorization form as <I> / = A. B., where A ■ and B y are full-rank matrices 
of dimensions k X rj and r f x k, respectively, with range(A y ) D range(A ;+1 ). One funda¬ 
mental consequence for this model is that there then exists a full-rank (k — rj) x k matrix 
F 7 , such that F 7 .d>, = 0 and hence F 7 .0. = 0 for all i > j because of the nested structure. 
Therefore, the vector 

r ;(z,-|o y z,_,). r ;(z,-|., z ,_,) sr ; 5+F> 

is uncorrelated with the past values Zj _j r-1 = ( Z' t V ■■■, Z 7 .)' and consists of k — rj 
linear combinations of Z y _ x t = (Z 7 ,..., Z 7 _^ +| )'. Thus, it follows that k — rj zero partial 
canonical correlations will occur between Z r and Z t _j, given Z,_, ,.... Z ( _ y+1 . Hence, 
performing a (partial) canonical correlation analysis for the various values of j = 1,2,... 
can identify the simplifying nested reduced-rank structure, as well as the overall order p, 
of the VAR model. 

The sample statistic that can be used to (tentatively) specify the ranks is 

k 

C(j,r) = -( N-j -jk - 1) Yj ln[l - (14.7.8) 

t=r +1 

for/- = k — 1, k — 2,... ,0, where 1 > ji\(j) > ■■■ > Pk(j) > 0 are the sample partial canon¬ 
ical correlations between Z, and Z,_ y -, given Z,_ h ..., Z t _j +l . (Calculation of sample 
canonical correlations was discussed previously in Section 6.2.4.) Under the null hypothesis 
that rank(d> 7 ) < r within the nested reduced-rank model framework, the statistic C{j, r) is 
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asymptotically distributed as chi-squared with (k — r) 2 degrees of freedom. Hence, if the 
value of the test statistic is not ‘ ‘significantly’ ’ large, we would not reject the null hypothesis 
and might conclude that <I> ; has reduced rank equal to the smallest value /q for which the test 
does not reject the null hypothesis. Note, in particular, that when/- = 0 the statistic in (14.7.8) 
is (essentially) the same as the LR test statistic given in (14.2.10) for testing H {) : d> ; = 0 
in an VAR(/) model, since it can be verified that ln[ | S y | /1 S y - _ j | ] = ~ P^U)\- 

Once the ranks in the nested reduced-rank VAR model have been specified, the pa¬ 
rameters in the restricted model can be estimated by maximum likelihood methods. Some 
normalization conditions on the A • and B ; in = A ; B ; are required to ensure a unique set 
of parameters. Assuming the components of Z t are arranged suitably, this parameterization 
can be obtained as d> ; = A, I)B. where A, is k X r l lower triangular with ones on the 
main diagonal and may have certain other elements “normalized” to fixed values of zero, 
B j contains unrestricted parameters, and I) ( = [I r , 0]' is /q X /q. Asymptotic distribution 
theory for the ML estimators of parameters of this model extends from theory for the LS 
estimators in a stationary VAR(p) model in a fairly direct manner. 

The structure of the reduced-rank VAR model relates directly to the concepts of Kro- 
necker indices, McMillan degree, and echelon canonical form of VARMA models discussed 
earlier. In particular, it can be easily verified that the McMillan degree of a nested reduced- 
rank AR process is equal to M = r n the sum of the ranks of the AR coefficient 
matrices <I> ; . In addition, from the nested reduced-rank structure it follows that the model 
can also be represented as 


°0 Z i-Z°y #Z ^ =6# + O 0 a i 

7=1 

with = A -1 , where A is the k X k matrix formed by augmenting the k X /q matrix Aj 
with the last k — r l columns of the k X k identity matrix, and 

d> # = A -1 <I>; = A'A.DB, = [B',0']' 

j J 1 J J 1 j' J 

having its last k — r ■ rows equal to zero. This relation can be viewed as an echelon 
canonical form representation, as in (14.7.1), for the nested reduced-rank vector VAR(p) 
model. Also, as noted by Reinsel (1997, p. 66), the notion of a nested reduced-rank model 
and its relationship to the echelon form representation can be directly extended to the 
VARMA model leading to the specification of a reduced-rank VARMA model for the 
vector process. 


14.8 NONSTATIONARITY AND COINTEGRATION 
14.8.1 Vector ARIMA Models 

Time series encountered in practice will frequently exhibit nonstationary behavior. To 
generalize stationary VARMA models to nonstationary processes, we can consider a general 
form of the VARMA model, d>(R)Z ; = 0(£)a,, wheresomeof the roots of det {<&(£)} = 0 
are allowed to have absolute value equal to one. More specifically, because of the prominent 
role of the differencing operator (1 — B) in univariate models, for nonseasonal time series 
we might only allow some roots to equal one (unit roots) while the remaining roots are all 
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greater than one in absolute value. A particular restrictive class of models of this type for 
nonstationary series are of the form 

<^ l (B)D(B)Z t = Q(B)a, (14.8.1) 

where D(B) = diag[(l - B) rf i,..., (1 - B) dk ] is a diagonal matrix, d l ,...,d k are nonnega¬ 
tive integers, and det {<&j (B)} = 0 has all roots greater than one in absolute value. Thus, this 
model, which is referred to as a vector ARIMA model, simply states that after each series z jt 
is individually differenced an appropriate number (c/,) of times to reduce it to a stationary 
series, the resulting vector series W, = D (B)Z t is a stationary VARMA(p, q) process. For 
vector time series, however, simultaneous differencing of all component series can lead to 
unnecessary complications in modeling and estimation as a result of “overdifferencing,” 
including noninvertible model representations, so differencing needs to be examined with 
particular care in the vector case. 

14.8.2 Cointegration in Nonstationary Vector Processes 

The nonstationary unit-root aspects of a vector process Z, become more complicated in 
the multivariate case compared with the univariate case, due in part to the possibility of 
cointegration among the component series z it of a nonstationary vector process Z r For 
instance, the possibility exists for each component series z it to be nonstationary with its 
first difference (1 — B)z jt stationary (in which case z it is said to be integrated of order 
one), but such that certain linear combinations y it = b^Z, of Z t will be stationary. That 
this possibility exists was demonstrated by Box and Tiao (1977) in their analysis of a 
five-dimensional dataset from Quenouille (1957). A process Z t that displays this behavior 
is said to be cointegrated with cointegrating vectors b t (e.g., Engle and Granger, 1987). 
An interpretation of cointegrated vector processes Z t is that the individual components 
z it share some common nonstationary components or “common trends”; hence, they tend 
to have certain similar movements in their longer term behavior. These common trend 
components will be eliminated upon taking suitable linear combinations of the components 
of the process Z r A related interpretation is that the component series z it , although they 
may exhibit nonstationary behavior, satisfy a long-run equilibrium relation bl Z t ~ 0 such 
that the process y jt = b' t Z t , which represents the deviation from the equilibrium, exhibits 
stable behavior and so forms a stationary process. Properties of nonstationary cointegrated 
systems have been investigated by Engle and Granger (1987) and Johansen (1988), among 
others. 

An Error Correction Form. A specific nonstationary VARMA model structure for which 
cointegration occurs is the model <i>(B)Z t = 0( B)a r where det {<I>( B) ) = 0 has d < k 
roots equal to one and all other roots are greater than one in absolute value, and also the 
matrix 


0 ( 1 ) = 1 - 0 !- 0 ;) 

has rank r = k — d. Because the process has unit roots fewer than the number of compo¬ 
nents, this type of process is called partially nonstationary by Ahn and Reinsel (1990). 
For such a process, it can be established that r linearly independent vectors b ( exist such 
that b'.Z t is stationary, and Z t is said to have cointegrating rank r. A useful approach to 
the investigation of this model is to express it in its equivalent error correction (EC) form 
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given by 


p- 1 9 

W, = CZ ( _, + 2 + a, - 2 0 7 a,_, 

7=1 7=1 


(14.8.2) 


where = (1 - 5)Z f ,0* = - £f =+| and 


C = -0(1 ) = -(/-X°7 

V 7=1 , 


(14.8.3) 


For instance, by subtracting Z t _j from both sides of the VAR(l) model Z, = <I>Z, , + a r , 
we see that the model can be expressed as ( Z t — Z t _ x ) = —(I — + a t = CZ Hl + 

a f , with C = —(I — <I>). The VAR(2) model can be expressed as 


(Z, - Z t _ j) — -(I — Oj — <J> 2 )Z r _ l - <5> 2 (Z t _\ ~ Z,_ 2 ) + a, 

= CZ,_j + 0*(Z 7 _! — Z,_ 2 ) + a t 

with C = -(I - ^ - 0> 2 ) and <!>* = -<I> 2 , and similarly for higher order VAR models. 

We note that the error correction form (14.8.2) has an invertible moving average operator 
but introduces CZ l _ l on the right-hand side of the model. Since the moving average 
operator remains unchanged, problems associated with noninvertibility are now avoided. 
The term CZ,_j is referred to as the error correction term and the rank r = k — d of the 
coefficient matrix C represents the number of cointegrating vectors in the system. 

To derive an alternative form, we note that the reduced-rank matrix C can be written 
as C = AB, where A and B are full-rank matrices of dimensions k X r and r X k, respec¬ 
tively. We can also determine a full-rank k X (k — /•) matrix Q] such that Q' A = 0, hence 
also QjC = 0. Hence, it can be established that the r linear combinations Y 2t = BZ, 
are stationary, the r rows of B are linearly independent cointegrating vectors, whereas the 
d = k — r components Y u = Q \Z t are ‘‘purely’’ nonstationary and are often referred to as 
the “common trends” among the components of the nonstationary process Z t . Therefore, 
the error correction form (14.8.2) can also be expressed as 


W, = ABZ f _! + j *)W t _j +a t -% 9ja t _j 

7=1 7=1 

P~ 1 9 

= AY 2,t-l + 2 *J W M + a > - Z 0 7«<-7 < 14 - 8 - 4 ) 

7=1 7=1 

Issues of estimation of cointegrated VAR models and testing for the rank r of cointegration 
will be discussed briefly in Section 14.8.3. 

Illustration: Nonstationary VAR(l) Model. To illustrate some of the preceding points, 
consider the VAR(l) process Z, = <I»Z f l + a t with d eigenvalues of <f> equal to one and 
the remaining r = k — d eigenvalues less than one in absolute value, and suppose the d unit 
eigenvalues have d linearly independent eigenvectors. Then there is a kx k nonsingular 
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matrix P such that P _ 1 <&P = A with A = diag(I</, A 2 ), where A 2 = diag(2 rf+ ],.... X k ) is an 
/• X/• diagonal matrix with |1,| < l.LettingP = [P 1 ,P 2 ] and Q = P _1 = [Qi, Q 2 ] / , where 
P| and Q] are k X d matrices, define Y, = Q Z, = (Y\ r Y ' lt )', that is, Yu = Q^Z, and 
Y 2 , = Q'Z f , and similarly e t = Q a, = (e'e' 2 Y. Then we have 

Q Z, = QOPQZ,.! + Q a, 

or Y t = AY,_[ + £ t . Therefore, the model in terms of Y, reduces to 

(1 — B)Y lr = Ei t and (I - \ 2 B)Y 2t = e 2 , 

so { Y Xt } is a d-dimensional purely nonstationary series, whereas { Y 2r 1 is an /--dimensional 
stationary series. Thus, { Z ,} is nonstationary but has r linearly independent linear combi¬ 
nations Y 2 t = Q'Z,, which are stationary, so Z, is cointegrated with cointegrating rank r 
and linearly independent cointegrating vectors, which are the rows of Q'. Conversely, since 
Z t =PY t =P l Y lt + P 2 Y 2t , the components of the vector Z r are linear combinations of 
a nonstationary vector (random walk) component Y lr and a stationary VAR(l) component 
Y 21 - Also notice that the error correction form of this VAR(l) model as in (14.8.2) is 

JY f = Z, — Z ; _j = CZj_] + u t 

where 

C = -(I - <&) = -P(I - A)Q = -P 2 (I r - A 2 )Q' 
which is clearly of reduced rank r. 

14.8.3 Estimation and Inferences for Cointegrated VAR Models 

As noted above, when the vector series Z ( is unit-root nonstationary but with cointegration 
features, it is not appropriate to difference all component series and model the resulting 
series W, = (1 — B)Z t by a VAR or VARMA model. Instead, we may prefer to incorporate 
the unit-root and cointegration features into the analysis using the model (14.8.2). This 
can provide a better understanding on the nature of the nonstationarity and improve the 
forecasting performance of the model. This section examines the estimation and statistical 
inference for this model focusing on the special case of an error correction VAR(p) model 

p -1 

W, = CZ,_! + 2 + a, (14.8.5) 

j =1 

where W, = Z, — Z ; _| with rank(C) = r < k. Note that a special case of (14.8.5), at one 
extreme, occurs with r = 0 (i.e., d = k unit roots and C = 0) and leads to a usual VAR 
model of order p — 1 for the series of first differences W r 

The least-squares and Gaussian maximum likelihood estimation of cointegrated VAR 
models and likelihood ratio testing for the rank of cointegration, generally utilizing the error 
correction form (14.8.5) of the model, have been examined by several authors including 
Johansen (1988,1991), Johansen and Juselius (1990), Ahn and Reinsel (1990), and Reinsel 
and Ahn (1992). Estimation of the cointegrated model, which imposes the restriction on 
the number d of unit roots in d>( B ) (or the number r = k — d of cointegrating relations), 
is equivalent to reduced-rank estimation, which imposes the restriction on the rank r of 
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the coefficient matrix C, which can be written as C = AB as noted in Section 14.8.2. So 
techniques from reduced-rank estimation of multivariate regression models can be utilized. 

When there are no additional constraints on the coefficient matrices O* in (14.8.5), 
that is, the only parameter constraints involved in the model are rank(C) < r. it follows 
from the original work by Anderson (1951) on reduced-rank regression that the Gaussian 
(ML) reduced-rank estimation can be obtained explicitly through the partial canonical 
correlation analysis between W, and Z t _ j, given W t _i ,..., W t _ p+1 . When there are 
additional constraints, however, iterative numerical techniques are needed for the Gaussian 
ML estimation. Specifically, in the partial canonical correlation analysis approach, let W t 
and Z ( _j denote the residual vectors from least-squares regressions of W, and Z t _ 1( 
respectively, on the lagged values W,_ j,.... W t _ p+i , and let 

N N N 

S ww=J J W,W / t s„ 2 = £ W t z t _ } Sa = 2z,_ 1 z;_ 1 

t=\ t=i t=i 

Then the Gaussian reduced-rank estimator of C in model (14.8.5) can be expressed ex¬ 
plicitly as 


C = XVv'c (14.8.6) 

where C = S^Sri is the full-rank LS estimator of C, X is the corresponding residual 
covariance matrix estimate of X = cov[a r ] from the full-rank LS estimation of (14.8.5), and 
\ = [V 1 ,...,V r ] are the vectors corresponding to the r largest partial canonical correlations 
Pi(p), i = 1 The vectors V, are normalized so that V'XV = I r . Note that the form of 

the estimator (14.8.6) provides the reduced-rank factorization as C = (XVjCV'C) = AB, 
with A = XV satisfying the normalization A'X A = l r . 

The asymptotic distribution theory of the LS and reduced-rank estimators, C and C, and 
of LR test statistics for rank has been established and the limiting distributions represented 
as functionals of vector Brownian motion processes, extending the “nonstandard” unit- 
root asymptotic distribution theory for univariate AR models as outlined in Section 10.1. 
In particular, we discuss the LR statistic for the test of the hypothesis H () : ranktC) < r for 
model (14.8.5). The LR test statistic is given by 


= in (M 

where S denotes the residual sum-of-squares matrix in the full-rank LS estimation (such 
that X = N~ l S), while S 0 is the residual sum-of-squares matrix obtained under the 
reduced-rank restriction that rank(C) = r. Again from the work of Anderson (1951), it is 
established that 


S 0 = S + (C - C)S 22 (C - C)' 


k 

is 0 i=isi nt 1 -^)]- 1 

i=r +1 


and it follows that 
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where the p j (p) are the cl = k — r smallest sample partial canonical correlations between 
W t and Z ; _|, given W t _\ ..., W,_ p+ \. Therefore, the LR statistic can be expressed 
equivalently as 


k 

—N ln((7) = -N X ln[l - Pjip)] (14.8.7) 

i=r +1 


The limiting distribution for the LR statistic has been derived, based on limiting distribution 
properties for the LS and reduced-rank estimators C and C, and its limiting distribution is 
represented by 


-JVln(U) H tr 


/' 




f 


B d (u)K d (u)'du 


-l 


f 


B d (u)cnz d (u) f 


(14.8.8) 


where \\ d (u) is a d -dimensional standard Brownian motion process, with d = k — r. The 
limiting distribution of the LR statistic under H {) depends only on d and not on any nuisance 
parameters or the order p of the VAR model. Note that in the special case of testing for (at 
least) one unit root, <7=1, the limiting distribution in (14.8.8) reduces to 


-N In (U) 


V 


fo B|(«)<7B|(«) 


fo B \(u) 2 du 


which is the asymptotic distribution for the (univariate) unit root statistic f 2 in the univariate 
AR(1) model as discussed in Section 10.1.1. 

Critical values of the limiting distribution in (14.8.8) have been obtained by simulation 
by Johansen (1988) and Reinsel and Ahn (1992) and can be used in the test of H Q . Similar 
to other LR testing procedures in multivariate linear models, it is suggested that the LR 
statistic in (14.8.7) be modified to—(IV — kp) Xf =f+ i l n U — pr(p)] for practical use in finite 
samples, as this may provide a test statistic whose finite sample distribution is closer to the 
limiting distribution in (14.8.8) than the ‘ ‘unmodified’ ’ LR test statistic. The ML estimation 
and LR testing procedures and asymptotic theory are also extended to the more practical 
case where a constant term 6 is included in the estimation of the VAR(p) model in error 
correction form, W t = CZ,_| + Yfj-\ _j + 8 + a, . A recommended procedure to 

be used in specification of the rank r or C in model (14.8.5) is thus based on performing 
LR tests of H 0 : rank(C) < r for a sequence of values of r = k — 1, k — 2, ..., 1,0, and an 
appropriate value of r can be chosen as the smallest value for which H {) is not rejected. For 
further discussion of the model building process and for software demonstrations using the 
R package, the readers are referred to Tsay (2014). 
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APPENDIX A14.1 SPECTRAL CHARACTERISTICS AND LINEAR 
FILTERING RELATIONS FOR STATIONARY MULTIVARIATE PROCESSES 

A14.1.1 Spectral Characteristics for Stationary Multivariate Processes 

The covariance-generating matrix function (provided ^“_ oo |y ;; (/)| < oo,i,j = 1,, k ) 
is defined as G(z) = Yih=-co r(Oz / , and the spectral density matrix of the stationary process 
{ Z t } as a function of frequency / is defined as 

00 

P(/) = 2G(e“'-' r/ ) = 2 Yj 0 < f <- (A14.1.1) 

The (h,j) th element of P(/), denoted as is 

00 

Phjif) = 2 X Tvine-W 
/=—00 

For h = j.pjjif ) is the (auto)spectral density function of the series Zj t , while for h / 
j, Piy(f) is the cross-spectral density function of z ht and Zj t . Notice that /?•■(/) is real 
valued and nonnegative, but since y A -(/) # y A -(—/) for /; # j, the cross-spectral density 
function p^jif) is in general complex valued, with p^jif) being equal to Pjf,(—f), the 
complex conjugate of Pj h {f). Therefore, the spectral density matrix P(/) is Hermitian, 
that is, P(/) = P(— /)'. Moreover, P(/) is a nonnegative-definite matrix in the sense that 
b ; P(f)b > 0 for any /.'-dimensional (real-valued) vector b, since b'V(f )b is the spectral 
density function of the linear combination b' Z t and hence must be nonnegative. Note also 
that 


1 r l/1 

E(/> = - / e' 2nfl P{f)df / = 0, ±1,±2,... (A14.1.2) 

2 J- 1/2 

that is, y hj (l) = fl(J 2 e ilnfl p hj ( f ) df. 

The real part of p^jif), denoted as c h j(f) = Re{p h j(f)}, is called the co-spectrum , 
and the negative of the imaginary part, denoted as = — lm{p h j(f)}, is called the 

quadrature spectrum. We can also express Phjif ) i n polar form as Phjif ) = 
where 


a hJ (f) = \p hj (f)\ = {%.(/) + q 2 hj (f )} 1/2 

and tfihjif) = {—q h j(f)/c h j(f)}. The function a h j(f) is called the cross-amplitude 

spectrum and 4>/y(f) is the phase spectrum. 

Similar to the univariate case, the spectral density matrix P(/) represents the covariance 
matrix of the random vector of components at frequency / in the theoretical spectral 
representations of the components Zj t of the vector process { Z,} corresponding to the 
finite-sample Fourier representations of the time series Zj t as in (2.2.1). The ( squared) 
coherency spectrum of a pair of series z ht and Zj t is defined as 


k lj^ = 


\PhjW )\ 2 
{ Phh(f)pjj(f )} 
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The coherency k h j(f) at frequency / can thus be interpreted as the correlation coefficient 
between the random components at frequency / in the theoretical spectral representations 
of z ht and Zj t . Hence, k h j(f) as a function of / measures the extent to which the two 
processes z ht and z Jt are linearly related in terms of the degree of linear association of their 
random components at different frequencies /. When spectral relations that involve more 
than two time series are considered, the related concepts of partial coherency and multiple 
coherency are also of interest. Detailed accounts of the spectral theory and analysis of 
multivariate time series may be found in the books by Jenkins and Watts (1968), Hannan 
(1970), Priestley (1981), and Bloomfield (2000). 


A14.1.2 Linear Filtering Relations for Stationary Multivariate Processes 

The representation of dynamic linear relationships through the formulation of linear filters 
is fundamental to the study of stationary multivariate time series. An important example 
is the moving average representation of the k-dimensional process Z, in (14.1.4). More 
generally, a multivariate linear (time-invariant) filter relating an /--dimensional input series 
X t to a k-dimensional output series Z t is given by the form 

00 

z, = Yj ^j^-j (A14.1.3) 

j=-00 

where the *P ; are k X r matrices. The filter is physically realizable or causal when the 
V ; = 0 for j < 0, so that Z t = Y.JLti v ^/ X is expressible in terms of only present and 
past values of the input process {A t }. The filter is said to be stable if YJL-oo ll'P/ II < 00 • 
where ||A|| denotes a norm for the matrix A such as ||A|| 2 = tr{A'A}. When the filter 
is stable and the input series X t is stationary with cross-covariance matrices r xx (l), the 
output Z, = 2“_ oo ‘F jX t - _j is a stationary process. The cross-covariance matrices of the 
stationary process { Z t } are then given by 

OO 00 

F zz (0 = cov[Z„ Z t+l ] = X Z + '' - (A14.1.4) 

/=—00 j =—00 

It also follows, from ( A 14.1.1), that the spectral density matrix of the output Z, has the 
representation 

P zz (/) = '¥(e a « f )V xx (jme- a «f)' (A14.1.5) 

where P xx (f) is the spectral density matrix of X t , and *F(z) = Y, .- a: , ^’ s tlic transfer 
function (matrix) of the linear filter. In addition, the cross-covariance matrices between Z, 
and X, are 


00 

r„(0 = cov[Z t , X t+I ] = 2 '9jT xx 0+j) 
j=- 00 

and the cross-spectral density matrix between Z t and X t is 

00 

p■„(/) = 2 X Y zxd)e- i2 * f ' = ^(e i2 ^)P xx (f) 
/=—00 
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so the transfer function x V(z) satisfies the relation x V{e ,lK ^) = P ZX (/)P XX (/) _1 • In practice, 
when a causal linear filter is used to represent the relation between an observable input 
process X, and an output process Z, in a dynamic system, there will be added unobserved 
noise N, in the system and a dynamic model of the form Z, = *P j^t-j + ^, will be 

useful. 

For a special example of the above linear filtering results, consider the basic station¬ 
ary vector white noise process {a t } defined in Section 14.1.3, with the properties that 
E[a t ] = 0, E[a t a' t ] = X, and E[a t a' i+/ ] = 0 for / # 0. Hence, a, has spectral density ma- 
trixP aa (f) = 2X. Then the process Z t = l P J a r _j, with n Il'Pjll < oo, is stationary 

and has cross-covariance matrices 


OO 

r zz (/)= (A14.1.6) 

7=0 

and spectral density matrix 

P.,(/) = ?¥{e i2,tf yL'¥(e- i2nf )' (A14.1.7) 

and the cross-covariance matrices between { Z t } and { a t } are \\ a (l) = 'F/X for / < 0 and 
zero for / > 0. 

In addition, for the stationary VARMA(p, q) process with infinite MA representation 
(14.1.4), the covariance matrix-generating function is given by G(z) = r(/)z / = 

'F(z -1 )£'F(z) / ; hence, the spectral density matrix of the V ARM At/;, q) process is given as 
in (A14.1.7) with *P(z) = <& -1 (z)0(z). 


EXERCISES 

14.1. Consider the bivariate VMA(l) process Z t = (I — QB)a r with 


4 1 


(a) Find the lag 0 and lag 1 autocorrelations and cross-correlations of Z r ; that is, 
find the matrices T(0), T(l), and p( 0), p( I). 

(b) Find the individual univariate MA(1) models for z 1( and z 2t ; that is, in the 
models z it = (1 —q i B)E jt ,i =1,2, find the values of the parameters i/,- and 
cr? = var[e-,], from T(0) and r(l). 

£ i 

(c) Show that the bivariate VMA(l) model above is invertible, state the matrix 
difference equation satisfied by the matrix weights n ; in the infinite AR form 
of the VMA(l) model, and explicitly evaluate the 11 ; for j = 1,2, 3,4. 

(d) It follows from Section 14.5.2 that the diagonal elements of X represent the 
one-step-ahead forecast error variances for the two series when each series is 
forecast from the past history of both series, that is, when each series is forecast 
based on the bivariate model. Compare these one-step forecast error variances 


0.4 0.3 

-0.5 0.8 
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in the bivariate model with the one-step forecast error variances a 2 , based on 
the individual univariate models in (b). 

14.2. For the stationary multivariate VAR(l) model (I - = a t , it is known 

that r(0) — <!>!"(() )fl> / = £. Hence, if the model parameters <I> and £ are 
given, this matrix equation may be solved to determine r(0). In the bivari¬ 
ate case, this leads to three linear equations in the unknowns /n(0), y 12 (0), 
and If these equations are expressed in matrix form as A\y ] , (0), 

ri 2 (0), y 22 (0)]' = b, give explicitly the expressions for A and b. Consider the specific 
case 


0.2 

0.3 

-0.6 

1.1 



1 

1 


(a) Show that 


HO) = 


5.667 

4.000 


4.000 

10.667 


Also, determine the stationarity of the VAR(l) model above, state the differ¬ 
ence equation satisfied by the Y(j),j > 1, and find the values of T(l), T(2), 
and T(3). In addition, compute the cross-correlation matrices p(0), pi I), 
p( 2), and p( 3). 

(b) Find the matrix coefficients 4',, V F 2 , and *P 3 in the infinite MA representation for 

Z t , and hence, compute the covariance matrix of the bivariate lead / forecast 
errors from the bivariate model using the formula £(/) = 'P / £4 // , for 

/= 1,2,3. 

(c) For a bivariate VAR(l) model, indicate what simplifications occur in the model 
when is lower triangular (i.e., </> 12 = 0). In particular, show in this case that the 
bivariate system can be expressed equivalently in the form of a 1 ‘unidirectional’ ’ 
transfer function model, as in Chapter 12, with z lt as input series and z 2t as 
output. In addition, indicate the specific nature of the univariate ARMA model 
for the series z 2t implied by this situation. 

(d) For a bivariate VAR(l) model, show that the case det(O) = 0 implies that 
there exists a linear combination of z lt and z 2 t ,Y lt = c n z lt + cpZo ( , which 
is a white noise series, and a second linear combination Y 2t = c 21 z 1( + c 22 z 2l , 
which is again a univariate AR(1) process. 

Hint : If det(<I>) = 0, then <I> has rank at most one and can be written as 


d> = 

1 

1 

n 


’l" 

‘/’ll 012 


«<P ll 

«<Pl2 


a 



14.3. Consider the VAR(p) model 


Z, = < l> x Z t _ x + «b 2 Z,_ 2 + - + c P p Z t _ p + a, 
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Verify that the model can be expressed as a VAR(l) model in terms of the kp- 
dimensional vector Y, = (Z' t , Z' f _ v ..., Zj +1 /, Y, = t&Y,.; + e t , using the kp X 
kp companion matrix O for the VAR( p) operator <D( B) = I — <b| B — ■■■ — <I> /; B p , 


O = 


I 

0 


<1>2 

0 

I 


0 

0 


0 0 ••• I 0 


In addition, show that det{I — <I>B} = det{I — — ••• — *I> p B p }, and hence the 

stationarity condition for the VAR(p) process is equivalent to the condition that all 
eigenvalues of the companion matrix <I> be less than one in absolute value. (Hint: 
To evaluate det{ I — <I> S}, multiply the / th column of I — d>6 by B and add to the 
(i — l)st column, successively, for / = p,p— 1,...,2.) 


14.4. For a bivariate VAR(2) model Z, = Z t _ x + <&-,Z ( _t + a t , with 



'1.5 

-0.6' 


'-0.5 

0.3 ' 


'4 

r 

= 

0.3 

0.2 

Cj>2 — 

0.7 

-0.2 

£ = 

1 

2 


(a) Verify that this model is stationary based on the nature of the roots of det{ I — 
<I>i B — (I» ? B 2 } = 0. (Note that you may want to make use of the result of 
Exercise 14.3 for computational convenience.) 

(b) Calculate forecasts Z n (l) for / = 1,..., 5 steps ahead, given that Z n = 
(1.2,0.6)' and Z n _ x = (0.5,0.9)'. 

(c) Find the coefficient matrices , j = 1,..., 4, in the infinite MA representation 
of the process, and find the forecast error covariance matrices 2(/) for / = 
1,.... 5. 

14.5. Consider the simple transfer function model 


(1 - B)z u — e u — 0 e\ t_i z 2t — coz u + e 2 , 


where e lf and e 2t are independent white noise processes. 

(a) Determine the univariate ARIMA model for z 2t , and note that z 2t is nonstation¬ 
ary. 

(b) Express the bivariate model for Z, = (z lr , z 2t )' in the general form of a ‘ ‘gener¬ 
alized” ARMA(1, 1) model, (I — <!>, B)Z t = (I — 0, B)a r and determine that 
one of the eigenvalues of <I», is equal to one. 

(c) Determine the bivariate model for the first differences (1 — B)Z t , and show 
that it has the form of a bivariate IMA(1, 1) model, (1 — B)Z t = (I — 0 B)a n 
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where the MA operator (I — 0 B ) is not invertible. Hence, this model represents 
an “overdifferencing” of the bivariate series Z t . 

14.6. Suppose Zj, , Z N , with N = 60, is a sample from a bivariate VAR(l) process, 
with sample covariance matrices obtained as 


' 1.0 

1.0' 


'0.6 

0.4' 


'0.30 

0.10' 

1.0 

2.0 

f(l) = 

0.7 

1.2 

f(2) = 

0.42 

0.64 


(a) Obtain the corresponding estimated correlation matrices p(0), p( 1), and 

(b) Find the sample Yule-Walker estimates for <I> and £ in the VAR(1) model, and 
find an estimate for the approximate covariance matrix of the estimator <I>. that 
is, for the covariance matrix of vec|<I> ]. 

(c) Based on the results in (b), test whether the matrix <I> has a lower triangular 
structure; that is, test whether </> 12 = 0. 

14.7. Suppose that a three-dimensional VARMA process Z, has Rronecker 
indices K { =3 ,K 2 = 1, and K 3 = 2. 

(a) Write the form of the coefficient matrices and 0j in the echelon canonical 
ARMA model structure of equation (14.7.1) for this process. 

(b) For this process {Z t j, describe the nature of the zero canonical corre¬ 
lations that occur in the canonical correlation analysis of the past vector 
P t = ( Z' f , Z' t V ...)' and various future vectors F* . 

(c) Write the form of the minimal dimension echelon state-space model corre¬ 
sponding to the echelon canonical ARMA model for this process. 

14.8. Verify that any VAR(p) model Z t = + a t can be 

expressed equivalently in the error correction form of equation (14.8.2) as 
W, = CZ ,_, + r~\ + a„ where W, = Z, - Z^O* = - ^ =j+l 

a ndC = -m) = -(l-Z P J=1 <i>j). 

14.9. Express the model for the nonstationary bivariate process Z, given in Exercise 
14.5 in an error correction form, similar to equation (14.8.2), as W t = CZ r _ t + 
a, — 0a t _ j, where W, = (1 — B)Z t . Determine the structure (and the ranks) of the 
matrices C and 0 explicitly. 

14.10. Consider analysis of the logarithms of monthly flour price indices from three U.S. 
cities. The raw (unlogged) data were given and analyzed by Tiao and Tsay (1989). To 
first investigate a VAR model for these data, with possible reduced-rank structure, 
the results of the partial canonical correlation analysis of Section 14.7.3, in terms 
of the (squared) partial canonical correlations pj(j) between Z t and Z ( _ ; for lags 
j = 1,... ,6, and the associated test statistic values computed using (14.7.8) are 
displayed in the following table: 
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j 

Squared Correlations 

r = 2 

C(j, r) 
r= 1 

r = 0 

1 

0.747, 

0.814, 

0.938 

129.93 

288.89 

551.67 

2 

0.003, 

0.081, 

0.274 

0.29 

7.97 

36.91 

3 

0.001, 

0.007, 

0.035 

0.07 

0.69 

3.76 

4 

0 . 000 , 

0.015, 

0.047 

0.03 

1.29 

5.31 

5 

0.017, 

0.036, 

0.073 

1.36 

4.24 

10.22 

6 

0 . 000 , 

0.020, 

0.077 

0.00 

1.51 

7.49 


In addition, values of |E ; -| and of the AIC ; and HQ ; model selection criteria for 
the full-rank VAR(j) models are given as follows: 


j (AR order) 

1 

2 

3 

4 

5 

6 

|Z;|(X10- 10 ) 

1.66213 

1.12396 

1.10523 

1.06784 

0.88963 

0.81310 

AIC, 

-22.336 

-22.542 

-22.369 

-22.210 

-22.195 

-22.084 

HQ; 

-22.240 

-22.350 

-22.079 

-21.822 

-21.707 

-21.494 


Interpret the preliminary model specification information above, and specify 
the structure (order and possible reduced ranks) of a VAR model that may seem 
appropriate for these data based on these results. 



PART FOUR 


DESIGN OF DISCRETE CONTROL 
SCHEMES 


In earlier chapters we studied the modeling of discrete univariate time series and dynamic 
systems involving two or more time series. We saw how once adequate models have been 
developed, they can be used to generate forecasts of future observations, to characterize 
the transfer function of a dynamic system, and to represent the interrelationships among 
several time series of a multivariate dynamic system. Examples involving real-world ap¬ 
plications have been used for illustration. However, the models and the methodology are of 
much wider importance than even these applications indicate. The ideas we have outlined 
are of importance in the analysis of a wide class of stochastic-dynamic systems occur¬ 
ring, for example, in economics, engineering, commerce, hydrology, meteorology, and in 
organizational studies. 

It is obviously impossible to illustrate every application. Rather, it is hoped that the theory 
and examples of this book will help the reader to adapt the general methodology to their own 
particular problems. In doing this, the dynamic and stochastic models we have discussed 
will often act as building blocks that can be linked together to represent the particular system 
under study. The techniques of identification, estimation, and diagnostic checking, similar 
to those we have illustrated, should be useful to establish the model. Finally, recursive 
calculations and the ideas considered under the general heading of forecasting will have 
wider application in evaluating the adequacy and the usefulness of a model for a specific 
purpose once the model has been fitted. 

We shall conclude this book by illustrating these possibilities in one further 
application—the design of feedback and feedforward control schemes. In working through 
Chapter 15, it is the task of bringing together the previously discussed ideas in a fresh 
application, quite as much as the detailed results, that we hope will be of value. 
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ASPECTS OF PROCESS CONTROL 


The term process control is used in different ways. Shewhart charts and other quality control 
charts are frequently employed in industries concerned with the manufacture of discrete 
“parts” in what is called statistical process control (SPC). By contrast, various forms of 
feedback and feedforward adjustment are used, particularly in the process and chemical 
industries, in what we call engineering process control (EPC). Because the adjustments 
made by engineering process control are usually computed and applied automatically, 
this type of control is sometimes called automatic process control (APC). However, the 
manner in which adjustments are applied is a matter of convenience, so we will not use 
that terminology here. The object of this chapter is to draw on the earlier discussions in 
this book to provide insight into the statistical aspects of these control methods and to 
appreciate better their relationships and objectives. 

We first discuss process monitoring using, for example, Shewhart control charts and 
contrast this with techniques for process adjustment. In particular, a common adjustment 
problem is to maintain an output variable close to a target value in a dynamic system subject 
to disturbances by manipulation of an input variable, to obtain feedback control. Feedback 
control schemes use only the observed deviation of the output from target as a basis for 
adjustment of the input variable. We consider this problem first in a purely intuitive way 
and then relate this to some of the previously discussed stochastic and transfer function 
models to yield feedback control schemes producing minimum mean square error (MMSE) 
at the output. This leads to a discussion of discrete schemes, which are analogs of the 
proportional-integral (PI) schemes of engineering control, and we show how simple charts 
may be devised for manually adjusting processes with PI control. 

It turns out that minimum mean square error control often requires excessively large 
adjustments of the input variable, “Optimal” constrained schemes are, therefore, intro¬ 
duced that require much smaller adjustments at the expense of only minor increases in 
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the output mean square error. These constrained schemes are generally not PI schemes, 
but in certain important cases it turns out that appropriately chosen PI schemes can often 
closely approximate their behavior. Particularly, in industries concerned with the manu¬ 
facture of parts, there may be a fixed cost associated with adjusting the process and, in 
some cases, a monitoring cost associated with obtaining an observation. We therefore also 
discuss bounded adjustment schemes for feedback control that minimize overall cost in 
these circumstances. 

In some instances, one or more sources of disturbance may be measured, and these 
measurements may be used to compensate potential deviations in the output. This type of 
adjustment action is called feedforward control, as compared to feedback control where 
only the observed deviation of output from target is used as a basis for adjustment. In certain 
instances it may also be desirable to use a combination of these two modes of control, and 
this is referred to as feedforward-feedback control. We therefore also present feedforward 
and feedforward-feedback types of control schemes for a general dynamic system that 
yield minimum mean square error at the output. Finally, we consider a general procedure 
for monitoring control schemes for possible changes in parameter values using Cuscore 
charts. More general discussion is given in the appendices and references. 


15.1 PROCESS MONITORING AND PROCESS ADJUSTMENT 

Process control is no less than an attempt to cancel out the effect of a fundamental physical 
law—the second law of thermodynamics, which implies that if left to itself, the entropy 
or disorganization of any system can never decrease and will usually increase. SPC and 
EPC are two complementary approaches to combat this law. SPC attempts to remove 
disturbances using process monitoring, while EPC attempts to compensate them using 
process adjustment (see also Box and Kramer, 1992). 

15.1.1 Process Monitoring 

The SPC strategy for stabilization of a process is to standardize procedures and raw 
materials and to use hypothesis-generating devices (such as graphs, check sheets, Pareto 
charts, cause-effect diagrams, etc.) to track down and eliminate causes of trouble (see, for 
example, Ishikawa, 1976). Since searching for assignable causes is tedious and expensive, 
it usually makes sense to wait until “statistically significant” deviations from the stable 
model occur before instituting this search. This is achie ved by the use of process monitoring 
charts such as Shewhart charts, Cusum charts, and Roberts’ EWMA charts. The philosophy 
is “don’t fix it when it ain’t broke”—don’t needlessly tamper with the process (see, for 
example, Deming, 1986). 

Figure 15.1 shows an example of process monitoring using a Shewhart control chart. 
Condoms were tested by taking a sample of 50 items every 2 hours from routine production, 
inflating them to a very high fixed pressure, and noting the proportion that burst. Figure 15.1 
shows data taken during the startup of a machine making these articles. Studies from similar 
machines had shown that a high-quality product was produced if the proportion failing this 
very severe test was p = 0.20. 

The reference distribution indicated by the bars on the right of Figure 15.1(a) charac¬ 
terizes desired process behavior. It is a binomial distribution showing the probabilities of 
getting various proportions failing in random samples of n = 50 when p stays constant at a 
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value of 0.20. If the data behaved like a random sequence from this reference distribution, 
we should say the process appeared to be in a state of control and no action would be called 
for. By contrast, if the data did not have this appearance, showing outliers or suspicious 
patterns, we might have reason to suppose that something else was going on. In practice, 
the whole reference distribution would not usually be shown. Instead, upper and lower 
control limits and warning lines would be set. When, as in this case, a normal curve (shown 
as a continuous line) provides a close approximation to the reference distribution, these are 
usually set at ±2er and ±3(7 with o = \/p(l — p)/n, the standard deviation of the sample 
proportion from a binomial distribution. In this example, with p = 0.20 and n = 50, this 
gives (7 = 0.057. Figure 15.1(a) shows that during the startup phase, the process was badly 
out of control, with the proportion of items failing the test initially as high as 50%. A 
process adjustment made after 12 hours of operation brought the proportion of defectives 
down to around 40%, but further changes were needed to get the process to a state of 
control at the desired level of p = 0.20. By a series of management actions, this was 
eventually achieved and Figure 15.1(b) shows the operation of the process at a later stage 
of development. Although for the most part the system now appears to be in the desired state 
of control, notice that the 10th point on the chart fell below the lower ±3(7 line. Subsequent 
investigation showed that the testing procedure was responsible for this aberrant point. 
A fault in the air line had developed and the condoms tested at about this time were 
inadvertently submitted to a much reduced air pressure, resulting in a falsely low value of 
the proportion defective. Corrective action was taken and the system was modified so that 
the testing machine would not function unless the air pressure was at the correct setting, 
ensuring that this particular fault could not occur again. 

Monitoring procedures of this kind are obviously of great value. Following Shewhart 
(1931) and Deming (1986), we refer to the natural variation in the process when in state 
of control (binomial variation for a sample of n = 50 with p = 0.20 in this case) as due to 
common causes. The common cause system can only be changed by management action 
that alters the system. Thus, a new type of testing machine might be introduced for which 
the acceptable proportion of defects should be 10%. Common cause variation would then 
be binomial about the value p = 0.10. 

The fault in the air line that was discovered by using the chart is called a special 1 cause. 
By suitable “detective” work, it is often possible for the plant operators to track down 
and eliminate special causes. The objectives of process monitoring are thus (1) to establish 
and continually confirm that the desired common cause system remains in operation and 
(2) to look for deviations unlikely to be due to chance that can lead to the tracking and 
elimination of assignable causes of trouble. 


15.1.2 Process Adjustment 

Although we must always make a dedicated endeavor to remove causes of variation such 
as unsatisfactory testing methods, differences in raw materials, differences in operators, 
and so on, some processes cannot be fully brought to a satisfactory state of stability in 
this way. Despite our best efforts, there remains a tendency for the process to wander off 


'Also called an “assignable'’ cause. However, we are sometimes faced with a system that is demonstrably not in 
a state of control and yet no causative reason can be found. So we will stay with Deming in his less optimistic 
word “special.” 
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FIGURE 15.2 One hundred successive values of the thickness of a metallic film when no adjustment 
was applied. 


target. This may be due to known but uncontrollable phenomena such as variations in 
ambient temperature, humidity, and feedstock quality, or due to causes currently unknown. 
In such circumstances, some system of process adjustment or regulation may be necessary 
in which manipulation of some additional variable is used to compensate for deviations in 
the quality characteristic. 

To fix ideas, we first introduce a simple feedback adjustment scheme relying on a 
purely empirical argument and leave theoretical justification until later. Consider the mea¬ 
surements shown in Figure 15.2 of the thickness of a very thin metallic film taken at equally 
spaced units of time. The quality characteristic was badly out of control, but standard pro¬ 
cedures failed to stabilize it (Box, 1991a). Suppose that the disturbance N, is defined as 
the deviation of this quality characteristic from its target value T when no adjustment is 
made ; that is, N t is the underlying noise process. Suppose also that there is a manipulable 
variable—deposition rate X —which can be used conveniently to adjust the thickness, and 
that a unit change in X will produce g units of change in thickness and will take full effect 
in one time interval. If at time t, X was set equal to X t , then at time t + 1 the deviation 
from target, e t+l = F f+1 — T, after adjustment would be 

e r+1 = gX t + N t+1 (15.1.1) 

Now suppose that at time t you can, in some way or the other, compute an estimate (forecast) 
N t ( 1) of N t+l and that this forecast has an error e,(l), so that 

N t+l =N t (l) + e t (l) (15.1.2) 

Then using (15.1.1) and (15.1.2), 

e t+l =gX, + N,( l) + e,(l) (15.1.3) 

If, in particular, X can be adjusted so that at time t, 

X, = -—N t (l) 
g 


(15.1.4) 
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then for the adjusted process 


£f+i = e,(l) (15.1.5) 

Thus, the deviation from target e t+] for the adjusted process would now be the error e r (l) in 
forecasting N t+l . instead of the deviation N l+l measured when the process is not adjusted. 

If we used measurements of one or more of the known disturbing input factors (e.g., 
ambient temperature) to calculate the estimate N t ( 1) of N t+l , we would have an example of 
feedforward control. If the estimate N t ( 1) of N t+l directly or indirectly used only present 
and past values of the output disturbance N t , N t _ { . N t _ 2 , ..., equation (15.1.4) would 
define a system of feedback control. A system of mixed feedback-feedforward control 
would employ both kinds of data. For simplicity, we will focus on the feedback case in the 
next three sections, and consider feedforward and mixed control in Section 15.5. 


15.2 PROCESS ADJUSTMENT USING FEEDBACK CONTROL 

Empirical Introduction. It might often be reasonable to use for the estimate N t ( 1) in 
(15.1.4) some kind of weighted average of past values N t , N t _ l , N t _ 2 , .... In particular, 
an exponentially weighted moving average (EWMA) has intuitive appeal since recently 
occurring data are given most weight. Suppose, then, that N t ( 1) is an EWMA, 

N t (\) = A(N t + 6N l _ 1 +0 2 N,_ 2 + ■■■) Q<0<1 (15.2.1) 

where 0 is the smoothing constant and A = l — 6, 

We first consider the situation where, as has usually been the case in the process 
industries, adjustments are continually made as each observation comes to hand. Then 
using equation (15.1.4), the adjustment (change in deposition rate) made at time t would 
be given by 


X, - X t _, = 1) - N,_ jd)] (15.2.2) 

g 

Now with e r _i(l) = N t — N t _ j(l) the forecast error, the updating formula for an EWMA 
forecast can be written as 


JV,(l)-JV' f _ 1 (l) = Ae t _ 1 (l) (15.2.3) 

Therefore, for any feedback scheme in which the compensatory variable X was set so as 
to cancel out an EWMA of the noise { N t }, the required adjustment should be such that 

X,-X t _ l =-^e t _ 1 (l) = --e t (15.2.4) 

8 8 

For the metal deposition process, g = 1.2, A = 0.2, and 7 = 80, so that the adjustment 
equation is 




-x t -i 



(15.2.5) 
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FIGURE 15.3 Manual adjustment chart for thickness that allows the operator to read off the 
appropriate change in deposition rate. 


15.2.1 Feedback Adjustment Chart 

This kind of adjustment is very easily applied, as is shown in Figure 15.3. This shows a 
manual feedback adjustment chart (Box and Jenkins, 1976) for the metallic thickness 
example given previously. To use it, the operator records the latest value of thickness 
and reads off on the adjustment scale the appropriate amount by which he or she should 
now increase or decrease the deposition rate. For example, the first recorded thickness of 
80 is on target, so no action is called for. The second value of 92 is 12 units above the 
target, so e 2 = 12, corresponding on the left-hand scale to a deposition rate adjustment of 
A 2 — X | = —2. Thus, the operator should now reduce the deposition rate by 2 units from 
its previous level. 

Notice that the successive recorded thickness values shown on this chart are the readings 
that would actually occur after adjustment ; the underlying disturbance is, of course, not 
seen on this chart. In this example, over the recorded period of observation, the chart 
produces a more than fivefold reduction in mean square error; the standard deviation of the 
adjusted thickness being now only about <j £ = 11. Notice the following: 

1. The chart is no more difficult to use than a Shewhart chart. 

2. While the “intuitive” adjustment would be —(1 /g)e t = —(5/6)e, (corresponding 
to what Deming called “tinkering”), the adjustment given by equation (15.2.4) is 
— (A/g)e t = —(l/6)e r . Thus, it uses a discounted or “damped” estimate Ae, of the 
deviation from target to determine the appropriate adjustment, where the discount 
factor A is 1 — 9, with 9 being the smoothing constant of the EWMA estimate of the 
noise. 

3. By summing equation (15.2.4), we see that the total adjustment at time t is 

t 

X t = k 0 + kj y e ; (15.2.6) 

t= l 

with k 0 = X Q and kj = —A/g. This adjustment procedure thus depends on the cumu¬ 
lative sum of the adjustment errors e ; and the constant kj determines how much the 
“intuitive” adjustment is discounted. 
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FIGURE 15.4 Dashes indicate the total adjustment IV,(1) = — gX t achieved by the manual adjust¬ 
ment chart of Figure 15.3. 


4. It follows from the previous argument that the adjustment is also equivalent to 
estimating at each time t the next value of the total unadjusted disturbance N t+l by 
an exponentially weighted average of its past values and using this estimate to make 
an appropriate adjustment. This is illustrated for the metallic thickness example in 
Figure 15.4. Notice that in this preliminary discussion we have not explicitly assumed 
any particular time series model or claimed any particular optimal properties for the 
procedure. That the procedure can be discussed in such terms accounts, to some 
extent, for its remarkable robustness, which we discuss later. 

In summary, then: 

1. By process monitoring we mean the use of, for example, Shewhart charts and/or 
Cusum or Cuscore charts, as discussed by Box and Ramirez (1992). These are 
devices for continually checking a model that represents the desired ideal stable state 
of the system: for example, normal, independent, identically distributed (iid) variation 
about a fixed target T. The use of such charts can lead to the elimination of special 
causes pointed to by discrepant behavior. The judgment that behavior is sufficiently 
discrepant to merit attention is decided by a process analogous to hypothesis testing. 
Its properties are described in terms of probabilities (e.g., the probability of a point 
falling outside the 3o limits of a Shewhart chart). 

2. By process adjustment we mean the use of feedback and feedforward control or 
some combination of these to maintain the process as close as possible to some 
desired target value. Process adjustment employs a system of statistical estimation 
(forecasting) rather than of hypothesis testing, and its properties are described, for 
example, by output mean square error. Process monitoring and process adjustment 
are complementary rather than competitive corresponding to the complementary roles 
of hypothesis testing and estimation (see, for example, Box, 1980). We discuss this 
point more fully later in the chapter. 
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N, 


e, = N, + % 


FIGURE 15.5 Feedback control loop. 


15.2.2 Modeling the Feedback Loop 

A somewhat more general system of feedback control is shown in Figure 15.5. The process 
is affected by a disturbance that in the absence of compensatory action would cause the 
output quality characteristic to deviate from target by an amount N t . Thus, { N t } is a time 
series exemplifying what would happen at the output if no control were applied. In fact, a 
compensating variable X t (deposition rate in our example) can be manipulated to cancel 
out this disturbance as far as possible. Changes in X will pass through the process and be 
acted on by its dynamics to produce at time t an amount or compensation y t at the output 
(again measured as a deviation from target). To the extent that this compensation y t fails to 
cancel out the disturbance N t , there will be an error, or deviation from target e, = Y t — T, 
equal to e t = N t + y t . The controller is some means (automatic or manual) that brings 
into effect the control equation X, = f ( e t , e t _ x ,. . .), which adjusts the output depending on 
present and past errors. 

A device that has been used in the process industries for many years is the three- 
term controller. Controllers of this kind are usually operated automatically and employ 
continuous rather than discrete measurement and adjustment. If e r is the error at the output 
at time t, control action could, in particular, be made proportional to e itself, to its integral 
with respect to time, or to its derivative with respect to time. A three-term controller uses a 
linear combination of these modes of control action, so that if X t indicates the level of the 
manipulated variable at time t, the control equation is of the form 

de, f 

X t = k Q + k-o + k P e t 4- k l / e t dt (15.2.7) 

where k D , k P , and k 2 are constants. 

Frequently, only one or two of these three modes of action are used. In particular, if only 
kj is nonzero (k D = 0, k P = 0), we have integral control. If only kj and k P are nonzero 
(k D = 0), we have proportional-integral (PI) control. 

Notice that in the example we have just discussed, where the result of any adjustment 
fully takes effect at the output in one time interval, the dynamics of the process are 
represented by y t = gX t _ j = g BX r The control equation A ( = k 0 + kj ]T' =| e ; in(15.2.6) 
is then the discrete analog of the control engineer’s integral control. 
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In general, the discrete analog of (15.2.7) is 


t 

X t = /Cq + kpfA £ t + kpE, + k I ^ £j 

i= 1 


or in terms of the adjustment to be made, 

x, = X, - X t _ t = k D V~e t + k P We t + k 2 e t 
= c l e, + c 2 e t _ l + c 3 e ,_ 2 


where c l , c 2 , and c 3 are suitable constants. Not unexpectedly, control equations of this type 
are of considerable practical value. 


15.2.3 Simple Models for Disturbances and Dynamics 

So far we introduced a simple system of feedback control on purely empirical grounds. The 
efficiency of any such system will depend on the nature of the disturbance and the dynamics 
of the process. From a theoretical point of view, we can consider very general models for 
noise and dynamics and then proceed to find the control equation that ‘ ‘optimizes’ ’ the sys¬ 
tem in accordance with some criterion. However, the practical effectiveness of such models 
is usually determined by whether they, and the ‘ ‘optimization’ ’ criterion, make broad scien¬ 
tific sense and by their robustness to likely deviations from the ideal. We have already kept 
this in mind when discussing control procedures from a purely commonsense point of view 
and we will continue to do so when choosing models for the disturbance and for process 
dynamics. 


Characterizing Appropriate Disturbance Models with a Variogram. A tool that helps 
to characterize process disturbances is the standardized variogram, which measures the 
variance of the difference between observations m steps apart compared to the variance of 
the difference of observations one step apart: 


G = var[jV f+m - N t ] = Vf 
var[/V, + l - N t ] ~ V, 


(15.2.8) 


For a stationary process, G m is a simple function of the autocorrelation function. In fact, then, 
G m = (1 — p m )/( 1 —/?[). However, the variogram can be used to characterize nonstationary 
as well as stationary behavior. Figure 15.6 shows realizations of 100 observations initially 
on target generated by (a) a white noise process, (b) a first-order autoregressive process, and 
(c)-(f) IMA(0,1,1) processes with A = 0.1, 0.2, 0.3, 0.4, respectively. The corresponding 
theoretical standardized variograms for these time series models are also shown. 

In some imaginary world we might, once and for all, set the controls of a machine and 
give a set of instructions to an ever-alert and never-forgetting operator, and this would 
yield a perfectly stable process from that point on. In such a case the disturbance might 
be represented by a “white noise” series, and its corresponding standardized variogram 
G m would be independent of m and equal to 1. But, in reality, left to themselves, ma¬ 
chines involved in production are slowly losing adjustment and wearing out, and left to 
themselves, people tend, gradually, to forget instructions and miscommunicate. Thus, for 
an uncontrolled disturbance, some kind of monotonically increasing variogram would be 
expected. We cannot obtain such a variogram from a linear stationary model, for although 
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FIGURE 15.6 Realization of white noise, autoregressive, and IMA(0,1,1) time series with theo¬ 
retical variograms. 


G m can initially increase with m, it will always approach an asymptote for such a process. 
That this can happen quite quickly, even when successive observations are highly positively 
correlated, is illustrated by the variogram shown in the figure for the first-order stationary 
autoregressive time series model N t = 0.9N l _ l + a t . In this example, even though succes¬ 
sive deviations N t from the target value have autocorrelation 0.9, G m is already within 5% 
of its asymptotic value after only 20 lags. This implies that, for example, when generated 
by such a model, observations 100 steps apart differ little more than those 20 steps apart. 

A model that can approximate the behavior of an uncontrolled system that continuously 
increases its entropy may be arrived at by thinking of the disturbance as containing two 
parts, a transitory part b t and a nontransitory part z t : 

N t = b t + z t (15.2.9) 

The transitory part b, is associated only with the tth observation and is supposed inde¬ 
pendent of observations taken at every other time. Typical sources contributing to b t are 
measurement and sampling errors. We represent this transitory part by random drawings 
from a distribution having mean zero and variance that is, { b t ) is a white noise process. 

Sticky Innovation Model. The evolving nontransitory part z, represents innovations that 
enter the system from time to time and get stuck there. These “sticky” innovations can 
arise from a multitude of causes, such as wear, corrosion, and human miscommunication. 
Thus, a car tire hits a sharp stone and from that point onward the tread is slightly damaged; 
a tiny crater caused by corrosion appears on the surface of a driving shaft and remains 
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there', certain details in the standard procedure for taking blood pressure in a hospital are 
forgotten and from that point on permanently omitted or changed. It is these nontransitory 
or sticky innovations that constitute the unwanted “signal” we wish to cancel out. Every 
system is subject to such influences. They continuously drive the increase in entropy if 
nothing is done to combat them. Such a sticky innovation model was suggested by Barnard 
(1959) and has a variogram that increases linearly with m. A special case of this model, 
which may also be used to approximate it, is the IMA(0,1,1) model: 

N, - N t _ t = a, - 6a,_ x (15.2.10) 

Recall, also, from Appendix A4.3 that if the nontransitory process Z, is an IMA(0,1,1) 
process, then the disturbance process N t = b t + z, in (15.2.9) with the added white noise 
b t , will again follow an IMA(0,1,1) model. Since for the IMA model (15.2.10) the EWMA 
of equation (15.2.1) with smoothing parameter 9 provides a minimum mean square error 
(MMSE) forecast with forecast error e,.^!) = a t , the corresponding discrete “integral” 
controller of (15.2.6) with kj = —X/g produces MMSE control withe, = a t . As we discuss 
later more formally, this is then a special case of the general MMSE linear feedback control 
scheme. 

Dynamics. In discussion of the integral control scheme of equation (15.2.6), we assumed 
that any change made at the input of the system would have its full effect at the output in 
one time interval. The assumed dynamic equation for the response y t was, therefore, 

y, = gBX t+ (15.2.11) 

where we now denote the fixed level of the “pulsed” input X in the time interval from t 
until t + 1 by X t+ . A somewhat more general assumption is that the system can be described 
by the first-order difference equation 

(1 + £V)J, = gBX t+ (15.2.12) 

(see, for example, (11.3.6)) or, equivalently, 

{\-5B)y t = {\-8)gBX t+ — 1 < 5 < 1 (15.2.13) 

where £ = 5/(1 — 5) or, equivalently, 5 = |/(1 + £). In that case at time t + 1 [cf. (15.1.1)], 
the deviation from target after adjustment is 

e r+l = 3^+1 + N t +l 


so that 

e,+1 = 7^f x ' + + ^' (1) + er(1) 

where N t ( 1) is some forecast of N t+l made at time t with forecast error e,( 1). Then, if we 
use the adjustment equation 

i _ ;d 

*r+ - X t _ l+ = X, = -——[N t ( 1) - IV,_t(1)] 

(1 - o)g 

the deviation e, +1 from the target is equal to the forecast error e,(l). Thus, again we 
substitute the error in forecasting N t+l for the deviation N t+l itself. In particular, if IV,(1) 
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is an EWMA forecast with smoothing parameter 9 and if A = 1 — 9, then using (15.2.3) 

(15.2.14) 


„ „ >v Ml-SB) 2(1 -S) + ASX 

x t = (1 — B)X t+ =-—- —£, =-—--- £ , 


gd-8) 


g (1 - 5) 


Finally, if N t can be represented by an 1MA(0,1,1) process with parameter 9, then e t = a t , 
and this adjustment will yield MMSE control. After summing (15.2.14), we obtain 


Xj — k Q + kp£ t + kj V £j 


(15.2.15) 


<=i 


in which 

k P = and kj = — — 

g g 

The control equation (15.2.15) yields the discrete analog of continuous PI control mentioned 
earlier and will hereafter be referred to as (discrete) PI control. 

Notice that despite their interesting ramifications, the adjustment equations correspond¬ 
ing to discrete integral control and PI control are extremely simple and intuitive. For discrete 
integral control 


x t = Cj£, (with cj = kj) 


and for PI control 

x t = C[£, + c 2 £,_i (with C] = kj + k P and c 2 = —k P ) 

They, thus, make the adjustment x t depend linearly on the last error and the last two errors, 
respectively. 

15.2.4 General Minimum Mean Square Error Feedback Control Schemes 

Arguing as earlier, it is not difficult to derive theoretical minimum mean square error 
feedback control schemes for the more general stochastic and linear dynamic models 
discussed in Chapters 4 and 11. Suppose the response to the series of adjustments in the 
manipulable input variable X t is represented by the dynamic transfer function relation 
(11.2.3), written as 

37, = L~\B)L 2 (B)Bf +1 X t+ 

where L^(B) and L 2 (B) are polynomials in B. This relation allows for / periods of pure 
dead time in the response. In addition, assume the noise or process disturbances { N t } may 
be represented by the linear stochastic ARIMA process defined by 

ViB’^J a, 

where a t is a white noise process. Then the error at the output, £,+f+\ = T r+ /+1 — T, at 
time t + f + 1 can be written 


N, = (p-\B)9(B)a, 


1 + 


/=! 


£?+/+! - J^+z+i + ^f+/+i - L \ '(B)L 2 (B)X t+ + N r+f+ , 
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Clearly, the effect of the disturbance at time t + f + 1 would be canceled if it were 
possible to set X t+ = —L l (B)L~ l (B)N t+t+l . Since / + 1 is positive, this is not possible, 
but intuitively we can obtain minimum mean square error control by replacing N t+ y +1 by its 
optimal forecast N t (f + 1) at origin t. Now we can write !V r+ y +1 = N,(f + 1) + e t {f + 1), 
where N t (f + 1) is the forecast at time t of N t+ j- +1 and e t (f + 1) is the error of the forecast 
for / + 1 steps ahead. The noise N t+ j- +l is not known at time t, but its minimum mean square 
error forecast N t (f + 1) can be deduced from the error sequence e t , £ t _i,e,_ 2 ..., which is 
observed. Thus, it follows that the control equation X t+ = —L l (B)L~ l (B)N t (f + 1) will 
produce at time t + / + 1 a level at the output that will cancel out the forecast of the noise 
/ + 1 periods ahead, and the error at the output will then be £ t+ f + \ = e t (f + 1), the error 
of the forecast. To express the control equation in terms of the error sequence e ( ’s, we can 
write 


and 


£, = e r -/-i(/ + 1) = a t + ^i«r-l + - + = L 4 (B)a, 


N t (f + 1) = V f+ \a t + W f+ 2 a t -1 + •" = L 3 ( B ) a t 

where the operators L 2 (B) and L 4 ( B ) are determined from knowledge of the model N, = 
cp^ 1 (B)9(B)a = i//(B)a t for the noise process. Hence, we have 

N,(f + 1) = L 3 (B)L~\B)£ t 


Therefore, the MMSE feedback control equation is then 


£iCB)£3CB) 

L 2 (B)L 4 (B) E> 


(15.2.16) 


Alternatively, as is usually convenient, we can define the control action in terms of the 
adjustment x, = X t+ — X t _ 1+ to be made at time t as 


L X (B)L 2 {B)(\-B) ^ 
l 2 (B)l 4 (B) E{ 


Example: Model with Dead Time. In particular, one more general dynamic model used 
above allows for “dead time”—that is, pure delay in response to adjustment. To illustrate 
the application of equation (15.2.16), consider a first-order system affected by between / 
and / + 1 unit intervals of pure delay so that 

(1 - 5B)y, = g( 1 - <5)[(1 - v) + VB]B'X ( _j (15.2.17) 

Combining this with the IMA(0,1,1) disturbance model of equation (15.2.10), we can use 
the general derivation above to obtain the MMSE control scheme. In terms of the general 
model, we have L 1 (B)/L l (B) = g(l — <5)(1 — uV)/(l — 8B), and the IMA noise model 
yields N(f + 1) — N t _ x {f + 1) = Aa t , so that L 2 (B) = 2/(1 — B), and also 

+ 1) = [1 + A(B + B~ + ••• + B^j\a t = L 4 (B)a r 
Hence, for the adjustment x t , we have the relation 


L 2 (B)L 4 (B)x, = —L 1 (B)L 2 (B)(l - B)£ t 
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and we obtain the MMSE control equation as 

(1 _ vV )[l + MB + B 2 + ... + B f )]x, = - (1 - SB)e t 

g(l -5) 

Thus, this optimal control scheme is not PI but is of the form 

x t = c,jc r _| + c 3 x,_ 2 + + CfX t _j _j + c(e r - <5e f _ 1 ) (15.2.18) 

where c = —A/[g( 1 - 5)] = k f + k P . 

An interesting example by Fearn and Maris (1991) describes an MMSE scheme of this 
kind applied to the control of gluten addition to bread-making flour in a flour mill where 
the object was to maintain the protein content of the flour as close as possible to the target 
value. A careful process study showed that to an adequate approximation for this process 
S = 0, v = 0, / = 1, and A = 0.25 (6 = 0.75). The adjustment equation was thus 

x, = —0.25x r _] - — e. (15.2.19) 

g 

The scheme was tested extensively and the authors remarked that it worked well over a wide 
range of manufacturing conditions and was robust to moderate changes in the parameters. 

The flour milling example does not yield a PI scheme. Notice, however, that the adjust¬ 
ment equation can be written x t = —(1 + XB)~ 1 (X/g)e t = —(1 — XB + X 2 B 2 — ■■■)(A/g)e t . 
For the rather small value X = 0.25, if we truncate the expansion after the first-order term, 
we obtain the PI scheme x t = c l e t + c 2 e t _] with q = —X/g and c 2 = A 2 /g. In practice, the 
behavior of this PI scheme will be almost identical to that of (15.2.19). More generally, we 
will find that PI schemes have an importance in addition to that conferred on them by their 
producing MMSE schemes for certain simple models. We therefore next consider how PI 
schemes can be put in effect using simple/ee(//>ac£ control charts. 

15.2.5 Manual Adjustment for Discrete Proportional-Integral Schemes 

The equation for the adjustment x t = X, — X t _ i for the discrete PI scheme (15.2.15) may 
also be written 


x, = -<?(1 + PV)e, (15.2.20) 

where 

k 

-G = k, and P = — (15.2.21) 

kj 

or equivalently, kj = —G and k F = —PG, and P is zero for pure integral control. In 
the special case where the stochastic and dynamic models are defined by (15.2.10) and 
(15.2.12), respectively, the PI control equation (15.2.15) yields MMSE when G = X/g and 
/’ = A 

Equation (15.2.20) shows how we can make a manual adjustment chart to put PI control 
into effect. We have already illustrated the use of such a chart for the metallic thickness 
example in Figure 15.3. For further illustration, we adapt an example discussed by Box 
et al. (1978). In a dyeing process, the quality characteristic of interest was the color index. 
Deviations e r from the desired target value of T = 9 were compensated by changing the 
dye addition rate X. For this example, the disturbance in the color index was approximated 
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by an IMA(0,1,1) model with k = 0.3, and a change of 1 unit in the dye addition rate X 
eventually produced a change of 0.06 unit in the color index so that g = 0.06. 

Suppose at first that | were zero so that the dynamic model was simply y t = gBX t+ , 
implying that a change in the input X t was fully effective at the output in one time interval. 
Then, 

i 0 TO 

-G = kj = -- = -j^ = - 5 and k P = 0 (15.2.22) 

g 0.06 

The MMSE integral feedback equation would be 

t t 

X t = k 0 -Gj j e, = k 0 -5^e, (15.2.23) 

i=i i=i 

and at time t the corresponding adjustment would be 

x t = —Ge t = —5e t (15.2.24) 

Appropriate action is read off the manual adjustment chart in Figure 15.7 with scales such 
that one unit deviation in the color index corresponds to — G = —5 units of adjustment of 
the dye addition rate. Action is taken after each observation by recording the value of the 
color index (indicated by a filled dot) and reading off on the left-hand scale the required 
adjustment to the dye addition rate. Thus, in the diagram at time 1:30 p.m., the color index 
was 9.14 calling for a reduction of —0.7 in the dye addition rate. 

Now consider the case where, due perhaps to incomplete mixing of the dye, the pro¬ 
cess was subject to inertia, which was approximated by a first-order dynamic system 
as in (15.2.13) with 8 = 0.2 and consequently £ = 5/(1 — S) = 0.25. Thus, as before, 
G = 0.3/0.06 = 5 and now P = j = 0.25. Thus, the appropriate MMSE control equation 
(15.2.15) would call for proportional-integral action such that 

t 

X, = k Q - 1.25e r -5^e ; (15.2.25) 

/=i 



FIGURE 15.7 Manual adjustment chart for discrete integral control. 
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FIGURE 15.8 Manual adjustment chart putting into effect discrete integral plus proportional 
control. 


The corresponding adjustment equation is 

x, = -5(1 + 0.25V)e, (15.2.26) 

To put this into effect manually, the chart in Figure 15.8 may be employed with the vertical 
dashed lines placed at a fraction P = k P /k I = 0.25 within each sampling interval. At each 
step the operator extrapolates the line through the last two points to the next dashed line 
and reads off the appropriate adjustment. Thus, in this figure, the last two readings, at 1:15 
and 1:30 p.m., were 9.06 and 9.14. The projected value of 9.16 requires reduction of the 
dye addition rate by —0.8 unit. No exactness is required. A line extrapolated by eye is good 
enough. As we later explore other uses of PI charts, we will sometimes use schemes in 
which P is negative. This calls for interpolation between the last two points rather than 
extrapolation. 

Rounded Adjustment. The feedback schemes as so far discussed require that we take some 
action at every opportunity—in this example, every 15 minutes. In practice, usually little 
is lost if the “rounded” adjustment chart indicated in Figure 15.9 is used. Such a chart 
is easily constructed from the original chart by dividing the action scale into bands. The 
adjustment made when an observation falls within the band is that appropriate to the middle 
point of the band on an ordinary chart. Figure 15.9 shows a rounded chart in which possible 
action is limited to -2-, —1-, 0-, 1-, or 2-unit catalyst formulation changes. The increase in 
mean square error (usually small), which results from using the rounded scheme, is often 
outweighed by the convenience of working with a small number of standard adjustments. 
A convenient width for the rounded bands is about one standard deviation o e or a little 
less. Justification for the use of such charts was provided by Box and Jenkins (1976, 
Section 13.1), where consideration is given to the effects of errors in the adjustment x t . 
Note that the use of all these manual adjustment charts requires no calculation—they are 
simple and entirely graphical. 
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FIGURE 15.9 Rounded adjustment chart for proportional-integral control. 


15.2.6 Complementary Roles of Monitoring and Adjustment 

It is sometimes complained that feedback control can conceal the nature of a compensated 
disturbance that otherwise might be eliminated. However, when combined with appropriate 
monitoring, this need not happen. Adjustment schemes and monitoring schemes are com¬ 
plementary and should be used in consort. Figure 15.10 illustrates the point. This shows 
the behavior of a simulated feedback scheme in which the disturbance is an IMA(0,1,1) 
process with X = 0.2 and the process dynamics are represented by a first-order system 
(15.2.13) with (5 = 0.5 and g = 1.0. The calculations were made assuming that the system 
is controlled by the PI controller, 


—X t = constant + 0.20e r + 0.20 T e,- (15.2.27) 

;=i 

which, for these stated parameter values, produces MMSE. Although this is not usu¬ 
ally done, the control action X, in Figure 15.10(b), as well as the deviation from target 
{e,} in Figure 15.10(d), can be charted (or better still, displayed on the screen of a pro¬ 
cess computer). Assuming the dynamics known, the exact compensation y t shown in 
Figure 15.10(c) can also be computed and hence the original disturbance N t of 
Figure 15.10(a) can be reconstructed. 

Examination of these monitoring displays motivates a generalized concept of common 
and special causes. The disturbance and the dynamic system together define the common 
cause system, which is taken account of in the design of the controller. But management 
action could change the system and hence the appropriate form of control. For example, 
suppose it was discovered that in the operation of the system, the pattern of the feed¬ 
back control action X t shown in Figure 15.10(b) mirrored that of a particular impurity in 
the feedstock. If this correlation checked out as a causative relation, management might 
decide to change the control system either by removing the impurity from the feedstock 
before it reached the process, or if that were impossible or too expensive, by measuring it 
and compensating for it by appropriate feedforward control. 
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FIGURE 15.10 (a) Disturbance N t , (b) feedback control action X t , (c) compensation of the dis¬ 

turbance y,, and (d) resulting deviation e, from the target value. 

In addition, a special cause producing a temporary deviation from the underlying system 
model, induced perhaps by misoperation of the controller or a mistake by the operator, can 
be evidenced in the residual sequence {e,} leading to remedial action. To illustrate this, we 
have added a deviation of size 3o a to the 30th value of the disturbance N t in Figure 15.10(a). 
After the disturbance has been subjected to feedback control, this outlier is clearly visible 
in the record of the deviations e t from target plotted as a Shewhart chart in Figure 15.10(d). 
The control limits can be calculated directly from the models used to design the controller 
or from the record of the e/s during stable operation. Also, as noted later in Section 15.6, 
more specific checks may be applied to detect possible changes in the system parameters. 

Assuming the models correct, in this particular example the residual e/s will be a 
white noise sequence. For control schemes that are not MMSE or that allow for dead time, 
however, the sequence {e ( } will, in general, be autocorrelated. One way to allow for this is 
to filter { e t } suitably to produce a sequence that, given the assumed model, will be white 
noise. Appropriate checks may then be applied to that series. 
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15.3 EXCESSIVE ADJUSTMENT SOMETIMES REQUIRED BY MMSE 
CONTROL 


One rationalization for the use of integral control and proportional-integral control is that 
for perhaps the simplest models for disturbance [equation (15.2.10)] and dynamics [equa¬ 
tions (15.2.12) and (15.2.13)], which approximate reality, these forms of feedback adjust¬ 
ment can produce minimum mean square error. 2 Unfortunately, MMSE control sometimes 
requires unacceptably large manipulations of the compensating variable X t . For illustration, 
consider again the situation where to an adequate approximation the disturbance model is 
the IMA(0,1,1) model of equation (15.2.10) with parameter 0 and the dynamic model is 
the first-order difference equation (15.2.13) with parameters 8 and g. Then, the MMSE 
feedback control adjustment scheme can be written (see (15.2.14)) as 

2 1 — SB X t j. x /1 c t i \ 

x t = -i- t- £ ‘ = —71- ^ e t~ d£ t-i) (15.3.1) 

g 1 - $ g(l - 5) 

where 2=1 —6 and e t = a r If 8 is negligibly small, MMSE control will be obtained with 
x t = — (X/g)e t and let us then write 

g 1 = var[x r ] = K;g 2 = k (15.3.2) 

But then, when 8 is not negligible. 



Thus, if 8 were near its upper limit of unity, er 2 could become very large. For example, with 
8 = 0.9 (so that only 1/10 of the eventual change produced by a step input is experienced 
in the first interval), rx 2 = 181k. In fact, as 8 approaches unity, the MMSE control action 
in equation (15.3.1) takes on more and more of an “alternating” character, 3 the adjust¬ 
ment made at time t reversing a substantial portion of the adjustment made at time t — 1. 
The reason for such alternating and variable adjustment can also be understood from the 
consideration that with 8 = 0.9, the constant P = £ = 9 of the manual adjustment chart for 
MMSE control would call for extrapolation of the line joining e r _j and e t by nine sampling 
intervals! In practice, constrained schemes can be used that at the expense of rather small 
increases in MSE at the output require much less compensatory manipulation. 


-This theoretical formulation, which results in a discrete PI controller yielding MMSE, is, however, not unique. For 

example, a PI controller giving MMSE can be obtained from the models Y, = £ B X t and ,V ( = (1 — 6 l B — 0 2 B 2 )a t , 
as well as the dynamics model (15.2.13) with IMA(0, 1. 1) noise model (15.2.10). 

3 A value of 8 = 0.9 corresponds to a time constant for the system of over nine sampling intervals. The occurrence 
of such a value would immediately raise the question as to whether the sampling interval being taken was too 
short; whether in fact the inertia of the process was so large that little would be lost by less frequent surveillance. 
Now (see Appendix A15.2) the question of the choice of sampling interval must depend on the nature of the noise 
that infects the system. Because the properties of the noise usually reflect system inertia as well, in many cases it 
would be concluded that the sampling interval should be increased. 
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15.3.1 Constrained Control 

When the adjustments x t form a stationary time series, such constrained control schemes 
can be obtained by finding an unconstrained minimum of the expression 

< 7 i+ao 2 x (15.3.3) 

where a can be regarded as an undetermined multiplier that allocates the relative quadratic 
costs of variations of e t and x t . Such a scheme will be called a constrained MMSE scheme 
or CMMSE scheme. In particular, we have seen that for an IMA(0,1,1) disturbance and 
first-order dynamics, the unconstrained MMSE scheme calls for an adjustment of 

x f = --(l+gV) £ , = - i(1 ~^ ) £ , (15.3.4) 

g g(l-o) 

It is shown in Appendix A15.1 (see equation (A 15.1.27)) that the corresponding CMMSE 
is of the form 

A( 1 - k a )( 1 - SB) 

x, = [fc, + (1 - A)fc 0 ]x f _! - (1 - A)k x x t _. ,- -p- - e t (15.3.5) 

g(l-o) 

where k {] and k l are fairly complicated functions of the parameters g, A, 8, and a. A table 
for applying such control is also given in Appendix A 15.1. 

For illustration suppose that A = 0.6, 8 = 0.5, and g = 1; then the optimal unconstrained 
MMSE scheme is 


x t = —1.2(1 — 0.5 B)e t 


(15.3.6) 


with 


( 0 . 6) 2 


1 + (0.5) 2 1 2 
(1 -0.5) 2 a ° 


1-80 


from (15.3.2)-(15.3.2a), and tr 2 = tr 2 . Suppose that this amount of variation in the adjust¬ 
ment x t produced difficulties in process operation and it was desired to reduce it so that c 2 
was about 0.50t7 2 . Use of Table A15.2 shows that this can be achieved with the scheme 


x t = 0.32x r _ 1 - 0.06 x,_ 2 - (0.57 X 1.2)(1 - 0.5 B)e, (15.3.7) 

which reduces er 2 to 0.47c- 2 with c 2 = 1.07c 2 . Thus, an almost fourfold reduction in c 2 
is produced for an increase of only 7% in the output variance. Such optimal constrained 
schemes are extremely attractive since they often produce a very large reduction in c 2 for 
only a small increase in c 2 . See, for example, Whittle (1963), Tunnicliffe Wilson (1970a, 
1970b), MacGregor (1972), Box and Jenkins (1976), Harris et al. (1982), Astrom and 
Wittenmark (1984), Rivera et al. (1986), and Bergh and MacGregor (1987). Unfortunately, 
such schemes can become complicated. 

In practice, however, exact “optimality” is to some extent an illusion because assump¬ 
tions are never true. It turns out that a form of constrained control, which is almost as good 
as CMMSE control, can often be obtained using an appropriately tuned PI controller. Such 
a controller has the advantage that it is simple and, in particular, is easily adapted to manual 
control. The following example shows how suitably tuned PI controllers can do almost as 
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TABLE 15.1 Illustrative Results Comparing Different Control Schemes for Models (15.2.13) 
and (15.2.10), with g= 0.4, 8 = 0.5, X = 0.4, and eU = 1 




< 7 2 

e 

( 7 2 

X 

(a) MMSE control 

-x, = (1 + V)e, 

—x,= — 0.82x,_j — 0 . 21 x ,_ 2 

l 

5 

(b) Optimal constrained control 

-0.39e, + 0.19 e,_| 

1.20 

0.25 

(c) Optimal constrained PI control 

-x, = 0.52(1 - 0.25V)e, 

1.20 

0.25 


well as optimal constrained schemes in producing great reductions in the variance er 2 of 
the adjustment for only modest increases in the output variance err. 

As an illustration, consider once again the situation where the process disturbance is 
represented by an IMA(0,1,1) process of (15.2.10) and the process dynamics by the first- 
order system (15.2.13), that is. 


{\-8B)y t = (\-8)gBX t+ 

and suppose that X = 0.4, er 2 = 1, g = 0.4, and 8 = 0.5, so that £ = 5/(1 — 8) = 1. Then 
minimum mean square error control is achieved by the PI scheme (a) shown in Table 15.1, 
yielding an output variance cr 2 of 1.00 with cr 2 = 5. Using the optimal constrained control 
equation (b) in Table 15.1, it is possible to achieve a 20-fold reduction in c 2 (to 0.25) at the 
expense of a 20% increase in c 2 to 1.20. But almost nothing is lost by, instead, using the 
much simpler optimal constrained PI controller (c) in Table 15.1 for which, to two-decimal 
accuracy, the same result is obtained. Notice that if we use a manual adjustment chart for 
the MMSE PI scheme (a), it would be necessary to extrapolate one whole time period ahead 
from the current time t. However, for the constrained PI control (c), we must interpolate a 
quarter of a period back from the current time t. This accounts for the much greater stability 
of the latter scheme. A fuller discussion of this topic can be found in Box and Luceno 
(1993). 


15.4 MINIMUM COST CONTROL WITH FIXED COSTS OF ADJUSTMENT 
AND MONITORING 

From the point of view of cost, we can summarize the discussion so far as follows. If we 
assume that the only control cost we need to consider is that of being off target and that 
this cost is proportional to the square of the deviation from target, unconstrained minimum 
mean square error control implies minimization of the total cost of the scheme. Suppose, 
however, that there is an additional quadratic loss associated with the size of the adjustment 
x t , and that a is some measure of the relative cost of being off target and of making 
adjustments. Then, <r 2 + aer 2 can be a measure of the overall cost of the scheme, and 
minimization of this quantity can produce a control scheme yielding minimum cost, and, 
as we have seen, suitably chosen PI schemes can often do almost as well. In either case, in 
practice, it is rarely easy to gauge a, in terms of relative costs. Instead, choice of a suitable 
scheme can be made by empirical judgment of what constitutes a satisfactory reduction of 
(7 2 in exchange for an acceptable increase in tr 2 . The same kinds of considerations apply to 
systems for which there are fixed adjustment and monitoring costs. 
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15.4.1 Bounded Adjustment Scheme for Fixed Adjustment Cost 

Especially in the ‘ ‘parts’ ’ industries, situations occur where an adjustment often has im¬ 
mediate effect but entails a fixed cost incurred, for example, by stopping a machine or 
changing a tool. 

Bounded Adjustment Charts. It was shown by Box and Jenkins (1963) that in the latter 
case, on the assumption of a quadratic off-target loss and an IMA disturbance, the minimum 
cost feedback control is not achieved by repeated adjustment after each observation. Instead, 
it requires that an adjustment be made only when an exponentially weighted average e r (l) 
of the deviations from target falls outside some fixed limits, ±L, say. We call this bounded 
adjustment. The adjustment that should then be made is the one that will produce a change 
—e,(l) at the output. Such an adjustment can be put into effect manually using a “bounded 
adjustment chart’ ’ such as that discussed below, or automatically. 

A bounded adjustment chart such as that shown in Figure 15.11 is superficially similar to 
that proposed for process monitoring by Roberts (1959). However, its purpose and design 
are different. The purpose is to decide when, and by how much, to adjust the process. The 
boundary lines are designed to minimize the overall cost, taking into account both the cost 
of making adjustments and the cost of being off target. Their purpose is not to discover 
statistically significant deviations from target. As the cost of adjustment approaches zero, 
the lines come closer together, converging on the target value when the cost of adjustment 
is zero and so yielding the “repeated adjustment’’ MMSE scheme. 

Figure 15.11 shows an example of such a chart for the metallic thickness control problem 
that would be appropriate if there had been a fixed cost for changing the deposition rate X. 
As before, A = 0.2, g = 1.2, and o a = 11. At time f, an open circle represents the deviation 
from target e t obtained after periodically changing the deposition rate X t as required by 
the chart. A filled circle represents an appropriate exponentially weighted moving average 


o Adjusted thickness values 
• EWMA forecasts 


(Adjusted) 

Thickness 



FIGURE 15.11 Bounded adjustment chart: the open circles are the thickness deviations e t (after 
adjustment), the filled circles are their EWMA forecasts £,_j(l) of these deviations. 
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forecast. This is conveniently updated using the formula 

e,(l) = Xe, + 0i r - i(l) 

The particular chart shown has boundary lines at 80 ± 8, that is, at T ± 0.720o a . We 
discuss the rationale for this choice below. To understand how the chart operates, suppose 
initially that the deposition rate is some value X Q . This will remain unchanged until time 
r = 13, when the forecasted value 88.7 (i.e., e r (l) = 8.7) falls outside the upper limit and 
the chart signals that a change is needed in the deposition rate that will reduce the thickness 
by —8.7. An adjustment of 


X 13 -X 0 = -8.7/1.2 

is now made in the deposition rate. Notice that such an adjustment does not upset the 
calculation of the next EWMA. For example, the forecasted thickness at time t = 14 is 

(0.2x81.3)+ (0.8x80.0) = 80.3 

where 80 is the appropriate previous forecasted value after the adjustment has been made 
to bring the process on target. 

15.4.2 Indirect Approach for Obtaining a Bounded Adjustment Scheme 

Tables for calculating the positions of the appropriate limit lines for minimum cost schemes 
in terms of the cost of being off target and the cost of adjustment were provided by Box and 
Jenkins (1963), Box et al. (1974), and Box and Kramer (1992). However, as we said earlier, 
these costs are not always easy to assess, and it seems more practical to use these results to 
provide an envelope of minimum cost schemes and then to choose among them empirically 
by considering the increased standard deviation at the output obtained in exchange for a 
longer interval between making adjustments. This approach was illustrated by Box (1991b). 
Table 15.2 shows theoretical average adjustment intervals (AAIs) and percent increase in 
standard deviation (ISD) of the adjusted process for various values of X and L/o a , where 
limit lines of the bounded adjustment scheme are at T ± L. 

For illustration, consider again the thickness adjustment example. Entering Table 15.2 
with X = 0.2 shows how much inflation in the error standard deviation would occur for a 
bounded scheme for various choice of L/o a . Thus, if L/o a were set equal to 0.5, a 2.6% 
increase in the standard deviation would occur, but on the average, adjustments would be 
needed only every 10 intervals. If L/o a were set equal to 1.0, a 9% increase in standard 
deviation would result, but the AAI would be 32. The scheme depicted in Figure 15.11 is a 
compromise in which L/o a was set equal to 0.72, which rough interpolation shows would 
give a 5% increase in the standard deviation with an AAI of about 20. To achieve this, L 
was set equal to 8 « 0.72 X 11. A Monte Carlo study using the 100 observations of metallic 
thickness graphed in Figure 15.2 shows an actual inflation of the standard deviation of 
8.5% for this example with an AAI of 14. In view of the rather limited sample size, the 
agreement must be considered quite good. 

Interpolation Chart. Any degree of technological sophistication can be used in applying 
these ideas: anything from transducers taking actions calculated by computers to operators 
taking actions based on a simple interpolation chart such as that shown in Figure 15.12, 
which used a pushpin and a piece of thread to indicate the appropriate manual adjustment. 
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TABLE 15.2 Average Adjustment Interval (AAI) and Percent Increase in Standard Deviation 
of Output (ISD) for Various Choice of L/ cs a Where the Limit Lines Are at T ± L 


A 

T/(r a 

AAI 

Percent Increase 

in Standard 
Deviation 

ISD 

0.1 

0.5 

32 

2.4 


1.0 

112 

9 


1.5 

243 

18 


2.0 

423 

30 

0.2 

0.5 

10 

2.6 


1.0 

32 

9 


1.5 

66 

20 


2.0 

112 

32 

0.3 

0.5 

5 

2.6 


1.0 

16 

10 


1.5 

32 

20 


2.0 

52 

33 

0.4 

0.5 

4 

2.6 


1.0 

10 

10 


1.5 

19 

21 


2.0 

32 

34 

0.5 

0.5 

3 

2.5 


1.0 

7 

10 


1.5 

13 

21 


2.0 

21 

35 


Source: Box (1991b). 


In the situation depicted, a previous forecast made at time t — 1 was 86 and the observation, 
which has just been made at time 1, is 66. Just before the current time f, therefore, the 
location of the pushpin on the current forecast scale would be at 86 with the thread hanging 
down from the pin. As soon as the actual value 66 became available, the thread would be 
pulled tightly to join the point 66 on the right-hand scale. The updated forecast of 82 would 
then be read off on the intermediate scale. This value lies within the boundaries, so that 
pushpin would be moved down to this new current forecast value with the thread hanging 
loose again until the next observation became available to produce a new updated forecast. 
As soon as an updated forecast fell outside either boundary, the appropriate adjustment 
in deposition rate to cancel out the forecasted deviation would be made, and the pushpin 
would then be placed on the target value ready for the next interpolation. 

15.4.3 Inclusion of the Cost of Monitoring 

It was shown by Box and Kramer (1992) how these results could be extended to the case 
where the cost of monitoring the process had also to be taken into account. They considered 
the possibility of further reducing cost by less frequent monitoring at an interval m instead 
of at a unit interval. They provided charts for obtaining minimum cost schemes given 
that in addition to a a and X (estimated from plant data), three cost constants were known: 
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Previous Updated Current 

forecast forecast observation 


Adjust 

deposition 

rate 



A 


FIGURE 15.12 Interpolation chart to update the forecasted value of thickness and to indicate when 
and by how much the deposition rate should be adjusted. 


(1) the (assumed quadratic) cost of being off target, (2) the fixed cost of making a change, 
and (3) the fixed monitoring cost of taking an observation. Given this information, the 
corresponding values of L/c> a and of m yielding minimum cost could be read off their 
charts. 

Again, these three individual costs may not be easy to determine, and Box and Luceno 
(1993) used their results to allow the choice of scheme to be based on empirical judgment. 
The charts shown in Figure 15.13 give the values of the AAI and the percent ISD with 
respect to o a corresponding to value of the nonstationarity measure A = 0.1(0.1)0.6, 0.8, 
and 1.0, the standardized action limit L/o a = 0.0(0.25) 2.5, and the monitoring interval 
m = 1,2,3,.... The charts cover small to moderate increases in the output standard deviation 
such as might be needed in practice. Thus, the larger values of m appear only with smaller 
values of A. 

For example, we saw earlier that by using a bounded adjustment chart with L/a c = 0.72 
instead of a continuous scheme, the average adjustment interval could be increased to about 
20 at the cost of an increase of 5% in the standard deviation. This is confirmed by the chart 
of Figure 15.13 for A = 0.2, which also shows, for example, that if we monitor the process 
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FIGURE 15.14 System at time t subject to an observed input disturbance u t and unobserved 
disturbance N t , with potential compensating variable X t . 


half as frequently (m = 2) and we again set L/a = 0.72, we could obtain about the same 
average adjustment interval (20) but with an 8% increase in the standard deviation. 


15.5 FEEDFORWARD CONTROL 

We now consider the design of discrete feedforward control schemes that give minimum 
mean square error at the output. A situation arising in the manufacture of a polymer 
is illustrated in Figure 15.14. The viscosity Y t of the product is known to vary in part 
due to fluctuations in the feed concentration u t , which can be observed but not changed. 
The steam pressure X t is a control variable that is measured, can be manipulated, and is 
potentially available to alter the viscosity by any desired amount and hence compensate 
potential deviations from target. The total effect in the output viscosity of all other sources 
of disturbance at time t is denoted by N t . 

15.5.1 Feedforward Control to Minimize Mean Square Error at the Output 

We can suppose that Y t , u t , X t , N t are deviations from reference values, which are such 
that if the conditions « = 0, X = 0 ,N = 0 were continuously maintained, then the process 
would remain in an equilibrium state such that the output was exactly on the target value 
Y = 0. 
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The transfer function model, which connects the observed but uncontrollable input 
disturbance u, (feed concentration) and the output Y t (viscosity), is assumed to be 

y u = 8~ 1 (B)co(B)B b u t 

Now, changes will be made in X at times t,t — 1, t — 2,... immediately after the obser¬ 
vations u t , u t _i, u t _ 2 , are taken. Hence, we obtain a “pulsed” input, and we denote the 
level of X in the interval t to t + 1 by X t+ . For this pulsed input, it is assumed that the 
transfer function model, which connects the compensating variable X t (steam pressure) 
and the output Y t (viscosity), has the effect 

y 2 , = L-\B)L 2 (B)Bf +l X 1+ 

where L\(B) and L 2 (B) are polynomials in B. Then, if no control is exerted (the potential 
compensating variable X t is held fixed at X t = 0), the total error or deviation from target 
value T = 0, e t = Y t — T, in the output viscosity will be 

e t = 8~ i (B)m(B)u t _ b + N, 

Clearly, it ought to be possible to compensate the effect of the measured parts of the 
overall disturbance by manipulating X t . Now at time t, and at the point P in Figure 15.14, 

1. The total effect of the input disturbance (u) is 

8~ l (B)aj(B)u t _ b 

2. The total effect of the compensation (X) is 

L~\B)L 2 (B)X,_ f _ l+ 

and we assume that the effects of the input influences u and X on the output Y are 
additive. Then, the effect of the observed input disturbance u will be canceled if we 
set 


L~\B)L 2 (B)X,_ f _ l+ = —8~ l (B)co(B)u t _ b 

Thus, the control action at time t should be such that 

L~\B)L 2 (B)X t+ = -S- 1 (£)ffl(£H_ (i _ / _ 1) (15.5.1) 


Case 1: b > f + 1 . Now at time t, the values u t+l , u t+2 ... are unknown. The control action 
(15.5.1) is directly realizable, therefore, only if (b — f — 1) > 0, in which case the desired 
control action at time t is to set the manipulated variable X to the level 

__Li(£M£) 

r+ L 2 (B)8(B) U, ~ ib -f~ 1) 

Alternatively, it is often more convenient to define the control action in terms of the change 
x t = X t+ — X t _ l+ , which is to be made in the level of X immediately after the observation 
u t has come to hand. This is 


“ " L 2 (B)8(B) 


(15.5.3) 
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The situation is illustrated in Figure 15.14. The effect at P from the control action is 
—8~ 1 (B)co(B)u t _ b , and this exactly cancels the effect at P of the input disturbance. The 
component of the deviation from target due to u t is (theoretically at least) exactly eliminated 
at the observation times, and only the component N, due to the unobserved disturbance 
remains. 


Case 2: (b — f — 1) Negative. It can happen that / + 1 > b. This means that an observed 
input disturbance reaches the output before it is possible for compensating action to become 
effective. In this case the action in (15.5.2) is not realizable because at time t, when the action 
is to be taken, the relevant value u t+( j +l _ h) of the input disturbance is not yet available. 
One would usually avoid this situation if one could (if some quicker acting compensating 
variable could be used instead of X), but sometimes such an alternative is not available. 

Now with u\ = 8~ 1 (B)co(B)u t represented by the linear model (see, for example, Box 
etal. (1974)) 


1 + 2 >;*' 


i=i 


where a t is a white noise process with mean zero and variance then 
u' t+f+l _ b = u' t {f+\-b) + e' t {f + \-b) 

In this expression 

e'tif + 1 ~b) = a, +f +i-b + w[ a t+f- b + - + ¥f_ h a t+ 1 
is the forecast error. Then, we can write the right-hand side of (15.5.2) in the form 

+ 1 - b) - L 1 (B)L-\B)e' t (f + 1 - b) 

Now, + 1 — b) is a function of the uncorrelated random variates a t+h (h > 1), which 
have not yet occurred at time t and which are uncorrelated with any variable known at time 
t (and the a t+h are therefore not forecastable). It follows that the optimal (minimum mean 
square error) action is achieved by setting 

LAB) . 

Xt+ = ~LAB) C, ' {f+l ~ b) (15 ' 5 ' 4) 

that is, by making the change in the compensating variable at time t equal to 

+ 1 -b)- «;_,(/ + 1 - b)} (15.5.5) 

This results in an additional component in the deviation e, from the target, which now 
becomes 


E t — N t + eJ_y_j(/ + 1 — b) 

If the model for the input disturbance is cp u (B)u t = 0 u {B)a t , then the model for u\ = 
8~ 1 (B)co(B)u t can be written 


cp' u (B)u' t = 6' u {B)a t 
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with 


qt'jLB) = cp u (B)8(B) and 0' U (B) = O u (B)a>(B) 

The needed forecasts u' t (f + 1 — b), obtained as in Chapter 5, can then be written conve¬ 
niently in terms of previous u’s and a’s obtainable from the u series itself. 

15.5.2 An Example: Control of the Specific Gravity of an Intermediate Product 

In the manufacture of an intermediate product, used for the production of a synthetic resin, 
the specific gravity Y t of the product had to be maintained as close as possible to the value 
1.260. This was actually achieved by a mixed scheme of feedforward and feedback control. 
We consider the complete scheme later and discuss here only the feedforward part. The 
process has rather slow dynamics, and also the disturbance is known to change slowly, 
so that observations and adjustments are made at 2-hour intervals. The uncontrolled input 
disturbance that is fed forward is the feed concentration u t , which is measured as deviations 
from an origin of 30 g/L. The relation between specific gravity and feed concentration over 
the range of normal operation has the effect 


y lt = 0.0016 u t 


where the effect y lt is measured from the target value 1.260. 

This relation contains “no dynamics” because the feed concentration can only be 
measured at the inlet to the reactor, so that in our general notation 8(B) = I, co( B ) = 
0.0016, b = 0. Control is achieved by varying pressure, which is referred to a convenient 
origin of 25 psi. The transfer function model relating specific gravity and pressure X t was 
estimated as having the effect 


(1-0.75)3^2, = 0.0024A,_ 1 + 


so that L[(5) = (1 — 0.75), L~,(B) = 0.0024, / = 0. So far as could be ascertained, the 
effects of pressure and feed concentration were approximately additive in the region of 
normal operation. Therefore, the control equation (15.5.4) is used, since b — f — 1 is 
negative, and yields 


X, 


(1-0.75)0.0016 
0.0024 




(15.5.6) 


for, in this particular example, u' t = 0.0016«, and hence ,)'(1) = 0.001615,(1). Study of the 
feed concentration showed that it could be represented by the linear stochastic model of 
order (0,1,1), 


V«, = (1 - 9 u B)a, 


with 0 U = 0.5. For such a process, 

55,(1) = (1 - d u )u, + 0 u u t _\(Y) 
that is, (1 — 6 u B)u\(\) = (1 — 9 u )u t or 


fi,(D = 


1-0 H 

- Ut 

1 - 9.,B ‘ 
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TABLE 15.3 Calculation of Adjustments for Feedforward Control Scheme (15.5.7) 


t 

0 

Concentration 
u t + 30 

31.6 

u , 

1.6 

-0.63 

Pressure 

A (+ + 25 

24.4 

x , 

1 

31.1 

1.1 

-0.31 

24.7 

0.3 

2 

34.4 

4.4 

-1.36 

23.6 

-1.1 

3 

32.0 

2.0 

-0.32 

24.7 

1.1 

4 

28.2 

-1.8 

0.90 

25.9 

1.2 


Thus, the control equation (15.5.6) can be written finally as 

(1 — 0.75)0.0016(0.5) 
,+ ~ 0.0024(1-0.55) 

or 


W r+ = 0.5W r _ 1+ - 0.333(t/, - 0 .lu t _ x ) (15.5.7) 

Table 15.3 shows the calculation of the first few of a series of settings of the pressure 
required to compensate the variations in feed concentration, given the starting conditions 
for time t = 0 of u 0 = 1.6, X i]+ = —0.63. Once the calculation has been started off, it is 
sometimes more convenient to work directly with the changes x, to be made at time t using 

x, = 0.5x,_! - 0.333(Vu, - 0.1S/u t _{) (15.5.8) 


Figure 15.15a shows a section of the feed concentration. Figure 15.15b shows the output 
after applying feedforward control. Figure 15.15c shows the specific gravity if no control 
had been applied. These values Y t are, of course, not directly available but may be obtained 
in general from the values Y' which actually occurred using 

y, = y ;+«;_ / _ 1 (/ + i-b) 

For this example then 


that is, 


Y, 


y; + 


0.0008 

- U t 1 

1 -0.5 B r_1 


Yj = 0.5T,_! + Y' t - 0.5T/_j + O.OOOSh,^ 

As a result of feedforward control, the root mean square error deviation of the output from 
the target value over the sample record shown is 0.003. Over the same period, the root 
mean square error of the uncorrected series would have been 0.008. The improvement is 
marked and extremely worthwhile. However, it appears that other unidentified sources of 
disturbance exist in the process, as evidenced by the drift away from target. This kind of 
tendency is frequently met in pure feedforward control schemes, but may be compensated 
by the addition of feedback control, as discussed in Section 15.2. We will briefly indicate 
the details of the combined scheme later in Section 15.5.4. 

Control action is effected in whatever manner is most suited to the situation. If changes 
are made infrequently, and if the control equation is fairly simple as in the above example. 
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FIGURE 15.15 (a) Feed concentration, (b) Specific gravity after feedforward control, (c) Specific 

gravity if no control had been applied. 


the theory we have outlined may be used to obtain optimal control manually. It is then 
convenient to use some form of control chart or nomogram that can be easily understood 
by the process operator, similar to charts illustrated in Section 15.2 regarding feedback 
control. 

15.5.3 Feedforward Control with Multiple Inputs 

No difficulty arises in principle when the effects of several additive input disturbances 
u\,u 2 ,..., u m are to be compensated by changes in X using feedforward control. Suppose 
the combined effect at the output of all the input disturbances is given by 

m m 

y, = £ Sj l (B)a)j(B)B h Ju j t = £ B b >u' j t 
j=1 J =l 

where u'. ( = <57 1 ( B)m J ( B)u J t , and, as before, the transfer function model for the compen¬ 
sating variable contributes the effect 

V 2 , = L~\B)L 2 (B)Bf +1 X t+ 

Then, proceeding precisely as before, the required control action is to change X at time t 
by an amount 


x t = -L l (B)L- 1 (B)J j [u f . 


u = - ■ f-A.] 


t+f+l—bj j,t+f—bj 


(15.5.9) 
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where 


\- u 'j,t+f+i-bj u 'j,t+f-b J 


j,t+f+l-bj 


■ U 


j,t+f-b: 


f+ 1 


■bj< 0 


u'j/J + 1 - bj) - Ci' r _ t (/ + l-bj)f+l-bj>0 


(15.5.10) 


If, as before, N t is an unmeasurable disturbance, then the error or deviation from target at 
the output from this control action in the compensating variable X t + will be 


m 

z,=N, + Yj e 'j t-f-iU + 1 - b ;) ( 15 . 5 . 11 ) 

j =i 

where e'. + 1 — bj) = 0 if / + 1 — bj < 0, and is the forecast error corresponding 

to the _/'th input variable Uj t if / + 1 — bj > 0. 

On the one hand, feedforward control allows us to take prompt action to cancel the 
effect of input disturbance variables, and if / + 1 — bj < 0, to anticipate completely such 
disturbances, at least in theory. On the other hand, to use this type of control we must be 
able to measure the disturbing variables and possess complete knowledge—or at least a 
good estimate—of the relationship between each input disturbance variable and the output. 
In practice, we could never measure all of the disturbances that affected the system. 
The remaining disturbances, which we have denoted by N t and which are not affected by 
feedforward control, could of course increase the variance at the output or cause the process 
to wander off target, as in fact occurred in the example discussed in Section 15.5.2. Clearly, 
we can prevent this from happening by using the deviations e r themselves to indicate an 
appropriate adjustment, that is, by using feedback control as discussed in earlier sections of 
this chapter. In fact, a combined feedforward-feedback control scheme can be used, which 
provides for the elimination of identifiable input disturbances by feedforward control and 
for the reduction of the remaining disturbance by feedback control. 


15.5.4 Feedforward-Feedback Control 

A combined feedforward-feedback control scheme provides for the elimination of iden¬ 
tifiable input disturbances by feedforward control and for the reduction of the remaining 
disturbance by feedback control. We briefly discuss a combined feedforward-feedback 
scheme in which m identifiable input disturbances u,, u 2 ,, u m are fed forward. The com¬ 
bined effects on the output of all the input disturbances and of the compensating input 
variable X t are assumed to be additive of the same form as given previously in Section 
15.5.3. It is assumed also that N' is a further unidentified disturbance and that the aug¬ 
mented noise N t is made up of N' plus that part of the feedforward disturbance that cannot 
be predicted at time t. Thus, using (15.5.11), 

m 

/V ' = /V r' + Z e lr-/-,(/+ | -V 

j =1 

where e'. t _f _ x (/ + 1 — bj) = 0 if / + 1 — bj < 0, and includes any further contributions 
from errors in forecasting the identifiable inputs. It is assumed that N t can be represented 
by a linear stochastic process so that, in the notation of Section 15.2.4, it follows that the 
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relationship between the forecasts of this noise process and the forecast errors may be 
written as 


L 3 (B)(1-B) 
l 4 (B) £ ' 


= N t (f + 1) - N t _ x {f + 1) 


where e, = e r _/_,(/ + 1 ) = N t - N,_ f _ff + 1). 

Arguing as in (15.2.16) and (15.5.9), the optimal control action for the compensating 
input variable X t to minimize the mean square error at the output is 


__L 1 (£)f™ L 3 CBXI -B) \ 

X< L 2 (B) \ ^ Uj ’ t+f+l ~ h J “j’t+f-bj L 4 (B) £ 'j 


(15.5.12) 


where the [ 1 / +1 _ 6 . — u'. t+ f-b .] are as gi yen in equation (15.5.10). The first term in 

the control equation (15.5.12) is the same as in (15.5.9) and compensates for changes 
in the feedforward input variables. The second term in (15.5.12) corresponds exactly to 
(15.2.16) and compensates for that part N' of the augmented noise, which can be predicted 
at time t. 


An Example of Feedforward-Feedback Control. We illustrate by discussing further the 
example used in Section 15.5.2, where it was desired to control specific gravity as close 
as possible to a target value 1.260. Study of the deviations from target occurring after 
feedforward control showed that they could be represented by the IMA(0,1,1) process 


VN t = (1 -0.5 B)a t 
where a t is a white noise process. Thus, 

L 3 (B)(1-B) 

L (B) -a, = N t ( 1) - N t _ t (1) = 0.5o r 

and e, = e t _ j (1) = a,. As in Section 15.5.2, the remaining parameters are 

S-\B)co(B) = 0.0016 b = 0 

1 A 7 D 

L 2 - 1 (£)L 1 (£)=^^ / = 0 


0.0024 


and 


u,0) ~ u t _ i(l) = 


0.5 

1-0.55 


(u, ~ u t _ j) 


Using (15.5.12), the minimum mean square error adjustment incorporating feedforward 
and feedback control is 


1 -0.7 B 
0.0024 


(0.0016)(0.5) 


1 - 0.5 B 


(u,-u t _ i) + 0.5e f 


(15.5.13) 


that is, 


x t = 0.5*,.! - 0.333(1 - 0.7 B)(u t - u,^) - 208(1 - 0.75)(1 - 0.5 B)e t 
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Calculated variation in specific 
gravity if no control applied 


(a) 


1.264 - 
1.260 
1.256 -I 


Typical variation in specific 
gravity using feedforward 
control only 

(b) 


1 

1 

1 



Typical variation in specific 
gravity using feedforward 
and feedback control 

(c) 


1 

1 

1 



FIGURE 15.16 Typical variation in specific gravity with (a) no control, (b) feedforward control 
only, and (c) feedforward with feedback control. 


or 

x, = 0.5x t _j - 0.333f< f + 0.566 k,_! - 0.233h,_ 2 - 208e, + 250£,_! - 73e,_ 2 (15.5.14) 

Figure 15.16 shows the section of record previously given in Figure 15.15, when only 
feedforward control was employed, and the corresponding calculated variation that would 
have occurred if no control had been applied. This is now compared with a record from a 
scheme using both feedforward and feedback control. The introduction of feedback control 
resulted in a further substantial reduction in mean square error and corrected the tendency 
to drift from the target, which was experienced with the feedforward scheme. 

Note that with a feedback scheme, the correction employs a forecast having lead time 
/ + 1, whereas with a feedforward scheme the forecast has lead time / + 1 — b and no 
forecasting is involved if / + 1 — b is zero or negative. Thus, feedforward control gains in 
the immediacy of possible adjustment whenever b is greater than zero. The example we 
have quoted is an exception in that b = 0, and consequently no advantage of immediacy is 
gained, in this case, by feedforward control. It might be true in this case that equally good 
control could have been obtained by a feedback scheme alone. In practice, possibly because 
of error transmission problems, the mixed scheme did rather better than the pure feedback 
system. 

15.5.5 Advantages and Disadvantages of Feedforward and Feedback Control 

With feedback control, it is the total disturbance, as evidenced by the error at the output, 
that actuates compensation. Therefore, it is not necessary to be able to identify and measure 
the sources of disturbance. All that is needed is that we characterize the disturbance N t 
at the output by an appropriate stochastic model (and as we have seen in earlier sections, 
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an IMA(0,1,1) model would often provide adequate approximation to the noise model). 
Because we are not relying on “dead reckoning,” unexpected disturbances and moderate 
errors in identifying and estimating the system’s characteristics will normally result only in 
greater variation about the target value and not (as may occur with feedforward control) in a 
consistent drift away from the target value. On the other hand, especially if the delay / + 1 
is large, the errors about the target (since they are then the errors of a remote forecast) may 
be large, although they have zero mean. Clearly, if identifiable sources of input disturbance 
can be partially or wholly eliminated by feedforward control, then this should be done. 
Then, only the unidentifiable error has to be dealt with by feedback control. 

In summary, although we can design a feedback control scheme that is optimal, in the 
sense that it is the best possible feedback scheme, it will not usually be as good as a combined 
feedforward-feedback scheme in which sources of error that can be are eliminated before 
the feedback loop. 


15.5.6 Remarks on Fitting Transfer Function-Noise Models Using Operating Data 

It is desirable that the parameters of a control system be estimated from data collected 
under as nearly as possible the conditions that will apply when the control scheme is 
in actual operation. The calculated control action, using estimates so obtained, properly 
takes account of noise in the system, which will be characterized as if it entered at the 
point provided for in the model. This being so, it is desirable to proceed iteratively in the 
development of a control scheme. Using technical knowledge of the process, together with 
whatever can be learned from past operating data, preliminary transfer function and noise 
models are postulated and used to design a pilot control scheme. The operation of this pilot 
scheme can then be used to supply further data, which may be analyzed to give improved 
estimates of the transfer function and noise models, and then used to plan an improved 
control scheme. 

For example, consider a feedforward-feedback scheme with a single feedforward input, 
as in Section 15.5.1, and the case with b — f — 1 nonnegative. Then for any inputs u t and 
X t+ , the output deviation from target is given by 

e, = 8-\B)co(B)u t _ b + L-\B)L 2 (B)X t _ f _ l+ + N t (15.5.15) 

and it is assumed that the noise N t may be described by an ARIMA(p, d, q) model. It is sup¬ 
posed that time series data are available for e r , u t , and X t+ during a sufficiently long period 
of actual plant operation. Often, although not necessarily, this would be a period during 
which some preliminary pilot control scheme was being operated. Then for specified orders 
of transfer function operators and noise model, the methods of Sections 12.3 and 12.4 may 
be used directly to construct the sums of squares and likelihood function and to obtain 
estimates of the model parameters in the standard way through nonlinear estimation using 
numerical iterative calculation. 

Consider now a pure feedback system that may be represented in the transfer 
function-noise model form 


e, = v(B)X t+ + N, 
X t+ = c(B)e t {+d t } 


(15.5.16) 

(15.5.17) 
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with 


v(B) = L~\B)L 2 {B)Bf +x 

where c(B) is the known operator of the controller, not necessarily optimal, and d t is 
either an additional unintended error or an added ‘ ‘dither’ ’ signal that has been deliberately 
introduced. The curly brackets in (15.5.17) emphasize that the added term may or may not 
be present. In either case, estimation of the unknown transfer function and noise model 
parameters can be performed, as described in Chapter 12. 

However, difficulties in estimation of the model under feedback conditions can arise 
when the added term d t is not present. To better understand the nature of issues involved 
in fitting of the model, we can substitute (15.5.17) in (15.5.16) to obtain 

[1 - v(B)c(B)]e, = y/(B)a,{+v(B)d t } (15.5.18) 

First consider the case where d, is zero. Because, from (15.5.17), X t+ is then a deterministic 
function of the e r ’s, the model (which appears in (15.5.16) to be of the transfer function 
form) is seen in (15.5.18) to be equivalent to an ARIMA model whose coefficients are 
functions of the known parameters of c(B) and of the unknown dynamic and stochastic 
noise parameters of the model. It is then apparent that, with d t absent, estimation difficulties 
can arise, as all dynamic and stochastic noise model forms v 0 (B) and i// 0 (B), which are 
such that 


yr~ l (B)[ 1 - u q (B)c(B)] = w~HB)[ 1 - u(B)c(B)] (15.5.19) 

will fit equally well in theory. In particular, it can be shown (Box and MacGregor, 1976) 
that as the pilot feedback controller used during the generation of the data approaches near 
optimality, near singularities occur in the sum-of-squares surface used for estimation of 
model parameters. The individual parameters may then be estimated only very imprecisely 
or will be nonestimable in the limit. In these circumstances, however, accurate estimates 
of those functions of the parameters that are the constants of the feedback control equation 
may be obtainable. Thus, while data collected under feedback conditions may be inadequate 
for estimating the individual dynamic and stochastic noise parameters of the system, it may 
nevertheless be used for updating the estimates of the constants of a control equation whose 
mathematical form is assumed known. 

The situation can be much improved by the deliberate introduction during data generation 
of a random signal d t as in (15.5.17). To achieve this, the action c(B)e t is first computed 
according to the control equation and then d t is added on. The added signal can, for example, 
be a random normal variate or a random binary variable and should have mean zero and 
variance small enough so as not to unduly upset the process. We see from (15.5.18) that 
with d, present, the estimation procedure based on fitting model (15.5.16) now involves a 
genuine transfer function model form in which e, depends on the random input d, as well 
as on the random shocks a t . Thus, with d t present, the fitting procedure tacitly employs not 
only information arising from the autocorrelations of the e ,’s but also additional information 
associated with the cross-correlations of the e,'s and the d,’ s. 

In many examples, data from a pilot scheme are used to re-estimate parameters with 
the model form already identified from open-loop (no feedback control loop) data and 
from previous knowledge of the system. Considerable caution and care is needed in using 
closed-loop data in the model identification/specification process itself. In the first place, 
if d t is absent, it is apparent from (15.5.16) that cross-correlation of the “output” e t and 
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the ‘ ‘input’ ’ X t+ with or without prewhitening will tell us (what we already know) about 
c(B ) and not, as might appear if (15.5.16) were treated as defining an open-loop system, 
about v(B). Furthermore, since the autocorrelations of the e t will be the same for all model 
forms satisfying (15.5.19), unique identification is not possible if nothing is known about 
the form of either i//(B) or v(B). On the other hand, if either i//( B) or v(B) is known, the 
autocorrelation function can be used for the identification of the other. With d t present, 
the form of (15.5.18) is that of a genuine transfer function-noise model considered in 
Chapter 12 and corresponding methods may be used for identification. 


15.6 MONITORING VALUES OF PARAMETERS OF FORECASTING AND 
FEEDBACK ADJUSTMENT SCHEMES 

Earlier we mentioned the complementary roles of process adjustment and process moni¬ 
toring. This symbiosis is further illustrated if we again consider the need to monitor the 
adjustment scheme itself. It has often been proposed that the series of residual deviations 
from the target from such schemes (and similarly the errors from forecasting schemes) 
should be studied and that a Shewhart chart or more generally a cumulative sum or other 
monitoring chart should be run on the residual errors to warn of changes. The cumulative 
sum is, of course, appropriate to look for small changes in mean level, but often other kinds 
of discrepancies may be feared. A general theory of sequential directional monitoring based 
on a cumulative Fisher score statistic (Cuscore) was proposed by Box and Ramirez (1992) 
(see also Bagshaw and Johnson, 1977). 

Suppose that a model can be written in the form of deviations e t that depend on an 
unknown parameter 9 as 


e t = e t (9) (15.6.1) 

and that if the correct value of the parameter 9 = 9 0 is employed in the model, { e t } = {a,} is 
a sequence of Normal iid random variables. Then, the cumulative score statistic appropriate 
to detect a departure from the value 9 0 may be written 

t 

q, = Yj e ‘ r ‘ (15 - 6 - 2 ) 

i=l 

where r, = —{de t /d9)\ g=e ^ may be called the detector signal. 

For example, suppose that we wished to detect a shift in a mean from a value 9 0 for the 
simple model y t = 9 + e t . We can write 

e, = e,(9) = y, - 9 a, = y, - 9 0 (15.6.3) 

Then, in this example, the detector signal is r t = 1 and Q, = e,-, the well-known 

cumulative sum statistic. 

In general, for some value of 9 close to 9 0 , since e t may be approximated by e t = 
a t — (9 — 9 0 )r r the cumulative product in (15.6.2) will contain a part 

-(9 - 0 O ) £ r] 

7=1 


(15.6.4) 
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FIGURE 15.17 Cuscore monitoring for detecting a change in the parameter 9 used in conjunction 
with the adjustment chart of Figure 15.3. 


which systematically increases in magnitude with sample size t when 9 differs from 9 0 . 
For illustration, consider the possibility that in the feedback control scheme for metallic 
thickness of Section 15.2.1, the value of X (estimated as 0.2) may have changed during the 
period r = 1 to r = 100. For this example, 


Thus, 


e,(9 ): 


1 - B 
1 -6B 


N, 


(15.6.5) 


1 -B 

(1 - 0B) 2 


N,_ i = - 


e,-\ 

1 -6B 


eVid) 

X 


(15.6.6) 


wheree ? _j(l) = 2(1 — 0B) 1 e t _ 1 is an EWMA of past e,’s. The cumulative score (Cuscore) 
statistic for detecting this departure is, therefore, 

t 

Qt = ~J j e i e i _ 1 (l) (15.6.7) 

A i=i 


where the detector signal e,_i(l) is, in this case, the EWMA of past values of the residuals. 
These residuals are the deviations from the target plotted on the feedback adjustment chart 
of Figure 15.3. The criterion agrees with the commonsense idea that if the model is true, 
then e, = a t and e t is not predictable from previous values. The Cuscore chart shown 
in Figure 15.17 suggests that a change in parameter may have occurred at about t = 40. 
However, we see from the original data of Figure 15.2 that this is very close to the point at 
which the level of the original series appears to have changed, and further data and analysis 
would be needed to confirm this finding. 

The important point is that this example shows the partnership of two types of control 
(adjustment and monitoring) and the corresponding two types of statistical inference (esti¬ 
mation and criticism). A further development is to feed back the filtered Cuscore statistic 
to “self-tune” the control equation, but we do not pursue this further here. 


APPENDIX A15.1 FEEDBACK CONTROL SCHEMES WHERE THE 
ADJUSTMENT VARIANCE IS RESTRICTED 

Consider now the feedback control situation where the models for the noise and system 
dynamics are again given by (15.2.10) and (15.2.13), so that e t = y t + N t with 


(1 - B)N, = (1 - 9B)a, and (1 - SB)y, = (1 - 8)gX,_ l+ 
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but some restriction of the input variance var[x r ] is necessary, where x t = (1 — B)X t . The 
unrestricted optimal scheme has the property that the errors in the output £j, e t _\, e r _ 2 ,... 
are the uncorrelated random variables a,, a t _\, a t _ 2 , ... and the variance of the output a 2 has 
the minimum possible value a 2 . With the restricted schemes, the variance a 2 will necessarily 
be greater than o 2 and the errors e r , , e t _ 2 ,... at the output will be correlated. 

We will pose our problem as follows: Given that crj be allowed to increase to some 
value a 2 = (1 + c)tj 2 , where c is a positive constant, we want to find the control scheme 
that produces the minimum value for a 2 = var[x ( ]. Equivalently, the problem is to find an 
(unconstrained) minimum of the expression a 2 + aa 2 , where a is some specified multiplier 
that allocates the relative costs of variations in e t and x r 

A15.1.1 Derivation of Optimal Adjustment 

Let the optimal adjustment, expressed in terms of the af s, be 

x, = --L(B)a, (A15.1.1) 

g 

where 


L(B) = I Q + fB + l 2 B 2 + 


Then, we see that the error t t at the output is given by 
(1 - S)g 


e, = 


, - - X, , , + N, 

‘ 1 -8B ,_1+ r 

1 “ 5 -(l - B)~ l L(B)a t _ l + (1 - B)~\ 1 - &B)a t 


1 - SB 


Cl t + 


A ■ 


L(B)( 1 - S ) 


Sa t _ 


(A15.1.2) 


1 - SB 

where S = (1 — _B) -1 . The coefficient of a, in this expression is unity, so we can write 

e, = [1 +Bp(B)]a, (A15.1.3) 


where 


p(B) = Pi + p 2 B + p 3 B 2 + ••• 


Furthermore, in practice, control would need to be exerted in terms of the observed output 
errors e f rather than in terms of the afs, so that the control equation actually used would 
be of the form 


1 L(B) 
gl + Bp(B) £ ' 


Equating (A15.1.2) and (A15.1.3), we obtain 


(A15.1.4) 


(1 - 8)L(B) = [2 - (1 - B)p(B)]( 1 - SB) 


(A15.1.5) 
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Since 8 , g, and a 2 are constants, we can proceed conveniently by finding an unrestricted 
minimum of 


C = 


(1 - 8) 2 g 2 V[x t \ + vV[e,] 


(A15.1.6) 


where, for example, 


V[x t \ = var[x,] 

and v = (1 - 8) 2 g 2 /a. Now, from (A15.1.3), V[£,\/(j 2 a = 1 + A' 2 , while from 

(A 15.1.1), (1 — 8)gx, = —(1 — 8)L(B)a, = — z{B)a t , so that 

(1 - 8) 2 g 2 V[x,] _ “ 2 

T j 


where 


<B) = Yj r i Bi = (1 - ^ L(B) = W " d " 1 - SB) 

j =o 

from (A15.1.5). The coefficients {r ; } are thus seen to be functionally related to the /./, by 
the difference equation 

/q - (1 + 8)iA i _ l + S^i_ 2 = -r,_j for i > 2 (A15.1.7) 

with r 0 = —(/tj — A), rj = —[/y 2 — (1 + 8)iii + AS]. Hence, we require an unrestricted min¬ 
imum, with respect to the //, , of the expression 

00 / 00 \ 

C =Y T ] + V [ l + Yj^j) (A15.1.8) 

j=o \ j =i / 

This can be obtained by differentiating C with respect to each /./,■ (/ = 1,2,...), equating 
these derivatives to zero and solving the resulting equations. Now, a given /q only influences 
the values r /+1 , r ; , and r ; _[ through (A15.1.7), and we see that 


dr. 


d/q 


-1 j = / - 1 
1 + S j = i 
-8 j = i + 1 
0 otherwise 


(A15.1.9) 


Therefore, from (A15.1.8) and (A15.1.9), we obtain 


—C = 2 

d^i 


dT i +1 , dr i , *i-l , 

7+1 -+ r,-— + r,_| —-+ iqq 

d/q dut d/q 


= 2[-<5r (+l + (1 + S)r j - z i _ ] + F/q] for i = 1,2,... 


(A 15.1.10) 
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Then, after substituting the expressions for the r ; - in terms of the /./,■ from equation (A 15.1.7) 
in (A 15.1.10) and setting each of these equal to zero, we obtain the following equations: 

(/ = 1) : -1(1 + 6' + 8 2 ) + 2(1 + 8 + 8 2 )h\ - (1 + 8) 2 h 2 + 8n-} + v/q = 0 

(A 15.1.11) 

(/ = 2) : AS — (1 + S ) 2 /q + 2(1 + 8 + 8 2 )//2 — (1 + + <5/q + v/q = 0 

(A 15.1.12) 

(i > 2) : [ 8B 2 - (1 + 8) 2 B + 2(1 +8 + 8 2 ) - (1 + 8) 2 F + 8F 2 + v]/q = 0 

(A15.1.13) 


A15.1.2 Case Where 5 Is Negligible 

Consider first the simpler case where 8 is negligibly small and can be set equal to zero. 
Then the equations above can be written as 


0 = 1): -a-/q) + (/q-/q) + v/q = o (A15.1.14) 

0 > 1) : [B - (2 + v) + F]fi,: = 0 (A15.1.15) 

These difference equations have a solution of the form 

Hi = A 1 k\+ A 2 k' 2 

where iq and k 2 are the roots of the characteristic equation 

B 2 -(2 + v)B+ 1 =0 (A15.1.16) 

that is, of 

B + B~ x = 2 + v 

Evidently, if k is a root, so is k~ x . Thus, the solution is of the form /,/, = A\ic l + A 2 k~‘. 
Now if v has modulus less than or equal to 1, has modulus greater than or equal to 1, 
and since e t = [1 + B/,i(B)]a t must have finite variance, A 2 must be zero with | v| < 1. By 
substituting the solution ji,- = A 1 k' in (A 15.1.14), we find that A l = A. 

Finally, then, /+ = Ak 1 , and since /./,■ and A must be real, so must the root k. Hence, 

a ,(£) = Xk 0 < v < 1 (A15.1.17) 

1 — kB 

, „ , . AkB 1 — 9kB , . , ^ , 

l + Bfi(B)=l + - --=--- (A15.1.18) 

1 — kB 1 — kB 


where 0=1 — A. Thus, 


1 - 0kB 

e, = - a, 

' 1 - kB ’ 


so that 


V(e t ] 


2J2 


1 + 


A~k 


1 — K 2 


(A15.1.19) 



604 ASPECTS OF PROCESS CONTROL 


Also, using (A15.1.5) with <5 = 0, 


Thus, 


and 


L(B) = A- 


(1 - B)Ak 
1 - kB 


M 1 - K-) 
1 - kB 


A 1 — K 
g 1 - icB 


a t 


V[x t ] _ A 1 2 (1 ~ k ) 2 _ A 2 \ -k 

a 2 g 2 1 — k 2 g 2 1 + k 


(A 15.1.20) 


(A15.1.21) 


Using (A 15.1.4) with (A15.1.18) and (A15.1.20), we now find that the optimal control 
action, in terms of the observed output error e t , is 


1 2(1 - k) 

x, = -e, 

' g 1 - 6kB ' 

that is, 


x r = (1 -X)Kx t _i - -A(\-K)e t (A15.1.22) 

g 

Note that the constrained control equation differs from the unconstrained one in two 
respects: 

1. A new factor (1 — A)xx t _ l is introduced, thus making present action depend partly 
on previous action. 

2. The constant determining the amount of integral control is reduced by a factor 1 — k. 

We have supposed that the output variance is allowed to increase to some value a 2 (l + c). 
It follows from (A15.1.19) that 


c = 


A 2 k 2 

1 — K 2 


that is, 


k = 



where the positive square root is to be taken. It is convenient to write Q = c/A 2 . Then, 
Q = x 2 /( 1 — k 2 ) and k 2 = <2/(1 + Q) and the output variance becomes a 2 (l + A 2 Q). 

In summary, suppose that we are prepared to tolerate an increase in variance in the 
output to some value <j 2 (l + A 2 Q): then 


1. We compute x = + Q). 

2. Optimal control will be achieved by taking action given by (A 15.1.22). 
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Table A15.1 Values of Parameters for a Simple Constrained Control Scheme 


cl A 2 = Q 

K 

W 

c/ A 2 = Q 

K 

W 

0.10 


53.7 

0.60 

0.612 

24.0 

0.20 


42.0 

0.70 

0.641 

21.9 


0.480 

35.1 

0.80 

0.667 

20.0 

0.40 

0.535 

30.3 

0.90 

0.688 

18.5 

0.50 

0.577 

26.8 

1.00 

0.707 

17.2 


3. The variance of the input will be reduced to 


V[x t \ 


A 2 1 — K 2 

- (7 

g 2 l+K ° 


that is, it will reduce to a value that is W% of that for the unconstrained scheme, 
where 


W = 100 



Table A15.1 shows k and W for values of Q between 0.1 and 1.0. For illustration, 
suppose that A = 0.4. Then the optimal unconstrained scheme will employ the control 
action 

0.4 

x, =- e, 

g 

withe, = a t . The variance of x t would be V[x t \ = (<j 2 /g 2 )0. 16. Suppose that it was desired 
to reduce this by a factor of 4, to the value (a 2 /g 2 )0.04. Thus, we require W to be 25%. 
Table A15.1 shows that a reduction of the input variance to 24% of its unconstrained 
value is possible with Q = 0.60 and k = 0.612. If we use this scheme, the output variance 
will be 


a] = a 2 (l + 0.16 X 0.60) = l.lOtr 2 
Thus, by the use of the control action 

x. = 0.37x ._1 — —0.16e, 
g 

instead of x t = —(0.4 /g)e r the variance of the input is reduced to about 1/4 of its previous 
value, while the variance of the output is increased by only 10%. 


Case Where 8 Is Not Negligible. Consider now the more general situation where S is not 
negligible and the system dynamics must be taken account of. The difference equation 
(A 15.1.13) is of the form 

(aB~ 2 + PB- 1 +y + PB + aB 2 )^ i = 0 

and if k is a root of the characteristic equation, so is v -1 . Suppose that the roots are 
and that sq and tc~, are a pair of roots with modulus < 1. Then, in the 
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solution 

p i = A x k‘ x + A 2 k' 2 + A 3 k~' + A 4 k~‘ 

A 3 and A 4 must be zero, because e t is required to have a finite variance. 
Hence, the solution is of the form 


Hi = A x k\ + A 2 k‘ 2 Ijstj| < 1 \k 2 \ < 1 

The A ’ s satisfying the initial conditions, defined by (A 15.1.11) and (A15.1.12), are obtained 
by substitution to give 


Ak x (\- «2) 
K \ -k 2 


A 2 - - 


Xk 2 (\ - *q) 
k { -k 2 


If we write k Q = k x + k 2 — k x k 2 , k x = k x k 2 , then 


14(B) = A 


/cq — k x B 


1 — (/cq + k x )B + k x B 2 


(A15.1.23) 


1 + Bp(B) 


1 - k x B - (1 - A)(k Q B - k x B 2 ) 
1 -(k 0 + k x )B + k x B 2 


Now substituting (A15.1.23) in (A15.1.5), 


L(B) = 


A( 1 - 3B)(l - k Q ) 

(1 - <5)[1 - (k {) + k\)B + k \ B 2 ] 


(A 15.1.24) 


(A15.1.25) 


L(B) _ _ 2(1 -8B)(l -k Q ) _ 

1 + Bn(B) ~ (1 - 5)[1 - k x B - (1 - A)(k 0 B - k x B 2 )] 

Therefore, using (A 15.1.4), we find that the optimal control action in terms of the error e t 


^ = _2_ (1 - SB)Q - k 0 ) _ 

g (1 - S)[ 1 - k x B - (1 - A)(k 0 B - k t B 2 )\ £ ’ 


(A15.1.26) 


2(1 - fc 0 )( 1 - SB) 

X , = [k x + (1 - 2)fc 0 ]x,_ 1 - (1 - A)k x x t _ 2 -- e, 

g(l - 5) 


(A15.1.27) 


Thus, the modified control scheme makes x t depend on both x t _ x and x t _ 2 (only on x t _ x 
if 2 = 1) and reduces the standard integral and proportional action by a factor 1 — k 0 . 


Variances of Output and Input. The actual variances for the output and input are readily 
found since 

F k 0 -k x B 1 


I 1 W 1 

£* — a t + A -T a t— 1 

[l-(k 0 + k x )B + k x B 2 \ 
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The second term on the right defines a mixed autoregressive-moving average process of 
order (2, 0, 1), the variance for which is readily obtained to give 

V[e t ] , , l2 f (*o + *i) 2 d - h) ~ 2*1 (fc 0 - k\) \ , , i2 „ 

^ \ (1 - fc!)(l + fci) 2 - (fc 0 + fci) 2 ] / 

Also, 

H*,] A 2 (1 - fcp)[(l + g 2 )(l + At!) - 2g(fc 0 + fct)] 
cl g 2 (l-«5) 2 (l + fc 0 +2fc 1 )(l-fc 1 ) 

Computation of k 0 and k\. Returning to the difference equations (A15.1.13), the charac¬ 
teristic equation may be written 

B a - MB 3 - NB 2 - MB + 1=0 

where M = (1 + S) 2 /S and N = |(1 + 8 2 ) + (1 + 8 2 ) + v\/8. It may also be written in the 
form 


(A15.1.28) 


(A15.1.29) 


(B 2 -TB + P)(B 2 - P~ l TB + P -1 ) = 0 


where 


T = C| + and P = k 1 k~, 
Equating coefficients of B gives 

T + P~ l T = M 


that is, T = PM/(l + P), and 

P + P- 1 +P~ 1 T 2 = N 
Thus, P + P- 1 + PM 2 /(l + P ) 2 = N, that is, 

(P + 2 + P-')(P + P -1 ) + M 2 = N(P + 2 + P -1 ) 


or 


(P + P -1 ) 2 + (2 - N)(P + P _1 ) + M 2 — 2N = 0 
For suitable vales of v, this quadratic equation will have two real roots: 

H] = K l K 2 + u 2 = + k~ x k 2 

the root being the larger. The required quantity P is now the smaller root of the quadratic 

equation 

P 2 -u l P+ 1 = 0 


and T is given by 


1/2 


T = [P(u 2 + 2)] 
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Table A15.2 Table to Facilitate the Calculation of Optimal Constrained Control Schemes 


8 




100Q 



20 

40 

60 

80 

100 

0.9 

100 W 

21.7 

11.3 

6.7 

4.5 

3.1 


k 0 

0.44 

0.585 

0.68 

0.74 

0.78 


k\ 

0.18 

0.27 

0.34 

0.39 

0.44 

0.8 

100 W 

22.0 

11.7 

7.2 

4.8 

3.4 


ko 

0.44 

0.585 

0.68 

0.74 

0.78 


k\ 

0.18 

0.27 

0.33 

0.38 

0.43 

0.7 

100 w 

22.7 

12.4 

8.0 

5.6 

4.1 


k 0 

0.44 

0.585 

0.68 

0.74 

0.78 


ki 

0.17 

0.25 

0.32 

0.36 

0.40 

0.6 

100 W 

24.1 

13.6 

9.0 

6.6 

5.0 


ko 

0.44 

0.58 

0.67 

0.73 

0.78 


k i 

0.16 

0.24 

0.29 

0.33 

0.365 

0.5 

100 W 

26.5 

15.5 

10.5 

7.9 

6.2 


ko 

0.43 

0.58 

0.67 

0.72 

0.77 


k\ 

0.15 

0.21 

0.26 

0.29 

0.32 

0.4 

100 W 

28.5 

17.7 

12.7 

9.8 

7.9 


ko 

0.43 

0.57 

0.66 

0.72 

0.76 


k\ 

0.13 

0.18 

0.22 

0.245 

0.265 

0.3 

100 w 

31.5 

20.5 

15.2 

12.0 

9.9 


k 0 

0.43 

0.57 

0.65 

0.71 

0.75 


k i 

0.105 

0.145 

0.17 

0.19 

0.20 

0.2 

100 W 

34.8 

23.6 

18.0 

14.5 

12.2 


k 0 

0.42 

0.56 

0.64 

0.69 

0.73 


k\ 

0.07 

0.10 

0.12 

0.13 

0.14 

0.1 

100 W 

38.2 

26.7 

21.0 

17.3 

14.6 


ko 

0.42 

0.55 

0.63 

0.68 

0.72 


k i 

0.04 

0.05 

0.06 

0.065 

0.07 


Table of Optimal Values for Constrained Schemes 


Construction of the Table. Table A15.2 is provided to facilitate the selection of an optimal 
control scheme. The tabled values were obtained as follows for each chosen value of the 
parameter 8 in the transfer function model: 


1 . 

2 . 

3. 


Compute M = (1 + 8) 2 /8 and N = ((1 + 8) 2 + (1 + 8 2 ) + v)/8 for a series of values 
of v chosen to provide a suitable range for Q. 


Compute «| = 1 /2(N — 2) + [((N — 2)/2) 2 + 2N — 
2) - [((N - 2)/2) 2 + 2N - M 2 ] 1/2 


Compute k | 


1/2 u x 


(l/2«i f 


11/2 


T -P = [k l (u 2 + 2)] l / 2 -k l 


and 



and « 2 = 1 /2(N — 


4. 


Compute Q = 


(k 0 + k 1 ) 2 V-k l )-2kfk 0 -k 2 ) 

a-k.m + k.f-ik. + kfi 2 ] 
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5. 


Compute W = 


(i - k Q m + s 2 )( i + fco - 2 s(k Q + 

(l + k 0 + 2ki)(l-k 1 )(l+S 2 ) 


6. Interpolate among the W, k (r k , values at convenient values of Q. 


Use of the Table. Table A15.2 may be used as follows. The value of <5 is entered in the 
vertical margin. Using the fact that V[e t ] = (1 + A 2 Q)o 2 . the percentage increase in output 
variance is 100QA 2 . A suitable value of Q is entered in the horizontal margin. The entries 
in the table are then (1) 100IU, the percentage reduction in the variance of x r , (2) k Q , and 

(3 )k v 

For illustration, suppose that A = 0.6, o = 0.5, and g = 1. The optimal unconstrained 
control equation is then 


x, = —1.2(1 — 0.5B)E t = -1.2(1 -0.5 B)a, 


and var[x r ] = 1.80(7“. Suppose that this amount of variation in the input variable produces 
difficulties in process operation and it is desired to reduce var[x f ] to about 0.50(7^, that is, to 
about 28% of the value for the unconstrained scheme. Inspection of Table A15.2 in the row 
labeled 8 = 0.5 shows that a reduction to 26.5% can be achieved by using a control scheme 
with constants k Q = 0.43, k i = 0.15, that is, by employing the control equation (A15.1.27) 
to give 


x t = 0.32x f _j - 0.06 x,_ 2 - (0.57 X 1.2)(1 - 0.5_B)e r 

This solution corresponds to a value Q = 0.20. Therefore, the variance at the output will 
be increased by a factor of 

1 + A 2 Q= l+0.6 2 (0.2)= 1.072 


that is, by about 7%. 


APPENDIX A15.2 CHOICE OF THE SAMPLING INTERVAL 

In comparison to continuous systems, discrete systems of control, such as those discussed 
here, can be very efficient provided that the sampling interval is suitably chosen. Roughly 
speaking, we want the interval to be such that not too much change can occur during the 
sampling interval. Usually, the behavior of the disturbance that has to pass through all or part 
of the system reflects the inertia or dynamic properties of the system, so that the sampling 
interval will often be chosen tacitly or explicitly to be proportional to the time constant or 
constants of the system. In chemical processes involving reaction and mixing of liquids, 
rather infrequent sampling, say at hourly intervals and possibly with operator surveillance 
and manual adjustment, will be sufficient. By contrast, where reactions between gases 
are involved, a suitable sampling interval may be measured in seconds and automatic 
monitoring and adjustment may be essential. 

In some cases, experimentation may be needed to arrive at a satisfactory sampling 
interval, and in others rather simple calculations will show how the choice of sampling 
interval will affect the degree of control that is possible. 
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A15.2.1 Illustration of the Effect of Reducing Sampling Frequency 

To illustrate the kind of calculation that is helpful, suppose again that we have a simple 
system in which, using a particular sampling interval, the noise is represented by a (0,1,1) 
process V /V, = (1 — 6B)a t and the transfer function model by the first-order system (1 — 
8B)y t = g(l — S)X,_ In this case, if we employ the MMSE adjustment 

x, =- SB)e, (A15.2.1) 

g(l - 5) 

then the deviation from target is e t = a, and has variance erj = rrj, say. 

In practice, the question has often arisen: How much worse off would we be if we took 
samples less frequently? To answer this question, we consider the effect of sampling the 
stochastic process involved. 


A15.2.2 Sampling an IMA(0,1,1) Process 

Suppose that with observations being made at some ‘ ‘unit’ ’ interval, we have a noise model 


VN, = (1 -e { B)a, 


with var[c/ ( | = = (7“, where the subscript 1 is used in this context to denote the choice 

of sampling interval. Then, for the differences VN t , the autocovariances y k are given by 

ft) = (1 + e l^ a ] 

Yi = (A15.2.2) 

Yj = 0 j > 2 

Writing C = (yq + 2jqj/jq, we obtain 

r 0-QQ 2 

^ 0i 

so that, given y () and y\ , the parameter A = I — 0 ] of the IMA process may be obtained by 
solving the quadratic equation 

(l-0i) 2 -f(l-0i) + f = 0 


selecting that root for which — 1 <6 j < 1. Also, 


a: = (A15.2.3) 

1 

Suppose now that the process N t is observed at intervals of h units (where h is a positive 
integer) and the resulting process is denoted by M r Then, 


VM, = N t - N,_ h = (a t + a t _i + + a t _ h+l ) 

- + a t _ 2 + ■■■ + a,_h ) 

V M t-h = N t-h ~ N t-2h = ( a t-h + a t-h- 1 + + a t-2h+\) 

- 9\(a t _ h _ x + ■■■ + a,_ 2h ) 



CHOICE OF THE SAMPLING INTERVAL 611 


and so on. Then, for the differences VM,, the autocovariances y^Ui) are 

y Q (h)=\(l + e 2 l ) + (h-m-e l ) 2 ]a 2 l 

Y l {h) = -e l a\ (A15.2.4) 

Yj(h) = 0 j >2 

It follows that the process M, is also an IMA process of order (0,1,1), 


VM, = (1 - 6 h B)e t 


where e, is a white noise process with variance a 2 . Now 


Y 0 (h) + 2 Yl (h) _ h( l-0 t ) 2 

n(h) ~ e x 

so that 

h( 1 - 9Q 2 (1 - 0 h ) 2 

0 X 6 h 

Also, since Y\(h) = —9 h (j 2 = —O^c 2 , it follows that 

a l = <h 

O 2 9h 


(A15.2.5) 


(A 15.2.6) 


Therefore, we have shown that the sampling of an IMA process of order (0,1,1) at in¬ 
terval h produces another IMA process of order (0,1,1). From (A 15.2.5), we can obtain the 
value of the parameter 9 h for the sampled process, and from (A15.2.6) we can obtain 
the variance a 2 h = var[e,] of the corresponding white noise generating process in terms of 
the parameters 0\ and a 2 = var[c/,] of the original process. 

In Figure A15.1, 0 n is plotted against log h, a scale of h being appended. The graph 
enables one to find the effect of increasing the sampling interval of a (0,1,1) process by 
any given multiple. For illustration, suppose that we have a process for which 0 ] = 0.5 and 
a 2 = 1. Let us use the graph to find the values of the corresponding parameters 0 2 , 04 , a 2 , u 2 
when the sampling interval is (a) doubled and (b) quadrupled. Marking on the edge of a 
piece of paper the points h = 1 , h = 2, h = 4 from the scale of the graph, we set the paper 



FIGURE A15.1 Sampling of IMA(0. 1,1) process: parameter 9 h plotted against log h. 
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horizontally so that h = 1 corresponds to the point on the curve for which 0\ = 0.5. We 
then read off the ordinates for 0 2 and 0 4 corresponding to h = 2 and h = 4. We find that 

0, = 0.5 0 2 = 0.38 0 4 = 0.27 

Using (A15.2.6), the variances are in inverse proportion to the values of 9, so that 
a\ = 1.00 a\ = 1.32 a] = 2.17 

Suppose now that for the original scheme with unit interval, the dynamic constant was <5j 
(again we will use the subscript to denote the sampling interval). Then, since in real time 
the same fixed time constant T = —h/ ln(<5) applies to all the schemes, we have 

«2 = 52 5 4 = S* 

The scheme giving minimum mean square error for a particular sampling interval h would 
be 


x,(h) = - 




or 


x,(h) 


'-o h 

g 



£,(h) 


(A15.2.7) 


Suppose, for example, with 9 l = 0.5 as above, <5[ = 0.8, so that S 2 = 0.64, <5 4 = 0.41. Then 
the optimal schemes would be 

h = 1 : x,(l) = -—(1 + 4V)e f (l) cl = 1.00 g~cl = 10.25 

g 

h = 2: x t {2) = ——(1 + 1.78V)e,(2) rr 2 = 1.32 g 2 c 2 = 5.50 

g 

h = 4 : x.(4) = -—(1 + 0.69V)e.(4) tr 2 = 2.17 g 2 c 2 = 3.84 

g 

In accordance with expectation, as the sampling interval is increased and the dynamics of 
the system have relatively less importance, the amount of ‘ ‘integral’ ’ control is increased 
and the ratio of proportional to integral control is markedly reduced. We noted earlier 
that an excessively large adjustment variance cr 2 would usually be a disadvantage. The 
values of g 2 c 2 are indicated to show how the schemes differ in this respect. The smaller 
value for tr 2 would not of itself, of course, justify the choice h = 4. Using an optimal 
constrained scheme, as is described in Appendix A 15.1, with h = 1, a very large reduction 
in <7 2 would be produced with only a small increase in the output variance. For example, 
entering Table A15.2 with 5 = 0.8,100(7 = 20, we find that for a 5% increase of output 
variance to the value (1 + A 2 Q)c 2 = 1.05(7 2 , the input variance for the scheme with h = 1 
could be reduced to 22% of its unconstrained value, so that g 2 c 2 = 10.25 X 0.22 = 2.26. 
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Using (A15.1.27), we obtain for the constrained scheme with h = 1, 


x, = 0.40xj_, — 0.09 x,_ 2 — 0.56 
of = 1.05 g 2 of = 2.26 


—(1 +4V) 
g 


e,(D 


In practice, various alternative schemes could be set out with their accompanying char¬ 
acteristics and an economic choice made to suit the particular problem. In general, the 
increase in output variance that comes with the larger interval would have to be balanced 
off against the economic advantage, if any, of less frequent surveillance. 


EXERCISES 

15.1. In a chemical process, 30 successive values of viscosity N t that occurred during a 
period when the control variable (gas rate) X t was held fixed at its standard reference 
origin were recorded as follows: 


Time Viscosities 


1-10 

92 

92 

96 

96 

96 

98 

98 

100 

100 

94 

11-20 

98 

88 

88 

88 

96 

96 

92 

92 

90 

90 

21-30 

90 

94 

90 

90 

94 

94 

96 

96 

96 

96 


Reconstruct and plot the error sequence (deviations from target) e t and adjustments 
x t , which would have occurred if the optimal feedback control scheme 


x t = — 10e r 4- 5e f _j 


( 1 ) 


had been applied during this period. It is given that the dynamic model is 


y t = 0.5y r _j +0.10x,_| 


( 2 ) 


and that the error signal may be obtained from 


e, = £,_i + V)V r + y t 


(3) 


Your calculation sequence should proceed in the order (2), (3), and (1) and initially 
you should assume that e 1 = 0, y i = 0, x, =0. Can you devise a more direct way to 
compute e t from Nf 
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15.2. Given the following combinations of disturbance and transfer function models: 


( 1 ) 

( 2 ) 

(3) 

(4) 


V1V, 
(1 - 0AB)y t 
S7N, 

(1 - l.2B + 0AB 2 )y t 
S7 2 N, 

(1 - 0 . 7 B)y, 

V1V, 
(1 - 0AB)y t 


(1-0.7 B)a t 
5.0X,_ l+ 

(1-0.5 B)a t 
(20-8.5)X,_ 1+ 

(1 -0.95 + 0.55 2 )a, 
3.0X,_ l+ 

(1-0.7 B)a t 
5.0X,_ 2+ 


(a) Design the minimum mean square error feedback control schemes associated 
with each combination of disturbance and transfer function model. 

(b) For case (4), derive an expression for the error e, and for its variance in terms of 

(c) For case (4), design a nomogram suitable for carrying out the control action 
manually by a process operator. 

15.3. In a treatment plant for industrial waste, the strength u t of the influent is measured 
every 30 minutes and can be represented by the model V«, = (1 — 0.5 B)a t . In the 
absence of control, the strength of the effluent Y t is related to that of the influent «, 
by an effect y u that can be represented as 

~ 1 - 0 . 2 B 11 ' 

An increase in strength in the waste may be compensated by an increase in the flow 
X t of a chemical to the plant, whose effect on Y t is represented by the effect 


3 ^ = 


21.6.B 2 ^ 
1 -0.7 B X ’ 


Show that minimum mean square error feedforward control is obtained with the 
control equation 


(0.7-0.25X1-0.75) 

(1 -0.25X1 -0.55) 

that is, X, = 0.7X t _i - 0AX t _ 2 - 0.0139(0.71), - 0.69w^_! + 0.145,_ 2 ). 

15.4. A pilot feedback control scheme, based on the following disturbance and transfer 
function models: 


x, = -M. 
' 21.6 


VJV, = a, 

(1 — 8B)y t = co 0 X t _ i + — co l X t _ 2+ 
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was operated, leading to a series of adjustments x, and errors e t . It was believed 
that the noise model was reasonably accurate, but that the parameters of the transfer 
function model were of questionable accuracy. 

(a) Given the first 10 values of the x t , e, series shown below: 


t 



t 

X, 

G 

1 

25 

-7 

6 

-30 

1 

2 

42 

-7 

7 

-25 

3 

3 

3 

-6 

8 

-25 

4 

4 

20 

-7 

9 

20 

0 

5 

5 

-4 

10 

40 

-3 


set out the calculation of the residuals a t (t = 2,3,..., 10) for 5 = 0.5, 
co 0 = 0.3, co l = 0.2, and for arbitrary starting values y® and xjj. 

(b) Calculate the values >'| , x 0 of y^ and xjj that minimize the sum of squares 

]T(=->( 0 rl<5 = 0.5, co Q = 0.3, co l = 0.2, y®, xjj) 2 and the value of this minimum sum 
of squares. 

15.5. Consider (Box and MacGregor, 1 976) a system for which the process transfer function 
is gB and the noise model is (1 — B)N, = (1 — 6B)a t so that the error e t at the output 
satisfies 


(1 - B)e, = g( 1 - B)X,_ l+ + (1 - 0B)a, 

Suppose that the system is controlled by a known discrete “integral” 
controller 


(1 - B)X t+ = —ce, 

(a) Show that the errors e t at the output will follow the ARMA(1, 1) process 

(1 — <pB)e t = (1 — 6B)a t <fi = 1 — gc 

and hence that the problem of estimating g and 0 using data from a pilot control 
scheme is equivalent to that of estimating the parameters in this ARMA( 1,1) 
model. 

(b) Show also that the optimal control scheme is such that c = c 0 = (l — 6)/g and 
hence that if the pilot scheme used in collecting the data happens to be optimal 
already, then 1—9 and g cannot be separately estimated. 




PART FIVE 


CHARTS AND TABLES 


This part of the book is a collection of auxiliary material useful in the analysis of time 
series. This includes tables and charts for obtaining preliminary estimates of the parameters 
in autoregressive-moving-average models, together with the usual tail area tables of the 
normal, / 2 , and 1 distributions. This is followed by a listing of the time series analyzed in 
the book, as well as some additional time series that are discussed in the exercises located 
at the end of the individual chapters. 
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TABLE A Table relating p, to 9 for a first-order moving average process 

CHART B Chart relating p j and p 2 to </>, and (j> 2 for a second-order autoregressive 
process 

CHART C Chart relating p\ and p 2 to 9 l and 0 2 for a second-order moving average 
process 

CHART D Chart relating p j and p 2 to </> and 9 for a mixed first-order 
autoregressive-moving average process 


TABLE E Tail areas and ordinates of unit normal distribution 
TABLE F Tail areas of the chi-square distribution 
TABLE G Tail areas of the t distribution 


Charts B, C, and D are adapted and reproduced from Stralkowski (1968) with 
permission of the author. Tables E, F, and G are condensed and adapted from 
Biometrika Tables for Statisticians, Volume I, with permission from the trustees of 
Biometrika. 
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Gregory C. Reinsel, and Greta M. Ljung 
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TABLE A Table Relating p { to 9 for a First-Order Moving Average Process 


0 

Pi 

8 

Pi 

0.00 


0.00 

0.000 

0.05 


-0.05 

0.050 

0.10 


-0.10 

0.099 

0.15 

-0.147 

-0.15 

0.147 

0.20 

-0.192 

-0.20 

0.192 

0.25 

-0.235 

-0.25 

0.235 

0.30 

-0.275 

-0.30 

0.275 

0.35 

-0.315 

-0.35 

0.315 

0.40 

-0.349 

-0.40 

0.349 

0.45 

-0.374 

-0.45 

0.374 

0.50 

-0.400 

-0.50 

0.400 

0.55 

-0.422 

-0.55 

0.422 

0.60 

-0.441 

-0.60 

0.441 

0.65 

-0.457 

-0.65 

0.457 

0.70 

-0.468 

-0.70 

0.468 

0.75 

-0.480 

-0.75 

0.480 

0.80 

-0.488 

-0.80 

0.488 

0.85 

-0.493 

-0.85 

0.493 

0.90 

-0.497 

-0.90 

0.497 

0.95 

-0.499 

-0.95 

0.499 

1.00 

-0.500 

-1.00 

0.500 


Table A may be used to obtain first estimates of the parameters in the (0, d, 1) model 
w t = (1 — 9B)a t , where w t = W d z r by substituting r^w) for p x . 
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CHART B Chart relating p, and p { to and (fr 2 for a second-order autoregressive process. 


The chart may be used to obtain estimates of the parameters in the (2, d, 0) model 
(1 — 0i-B — <fi 2 B 2 )w t = a v where w t = V d z t , by substituting r^(w) and r 2 {w) for p\ 
and p 2 . 



1 L 


J I L 


- 2.0 - 1.75 - 1.5 - 1.25 - 1.0 - 0.75 - 0.5 - 0.25 

0 , 


J I L 


J J I L 


1.0 

■ 0.75 

- 0.5 

- 0.25 

- 0 

- 0 . 25 ' 
- 0.5 
- 0.75 
-1.0 


0 0.25 0.5 0.75 1.0 1.25 1.50 1.75 2.0 

'1 -> 


CHART C Chart relating p, and p, to 9 { and d 2 for a second-order autoregressive process. 

The chart may be used to obtain estimates of the parameters in the (0, d, 2) model 
w t = (1 — — 6 2 B 2 )a t , where w t = V d z t , by substituting r\(w) and r 2 (w) for p x 

and p 2 . 
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CHART D Chart relating p, and p 2 to 4> and 8 for a mixed first-order autoregressive-moving 
average process. 

The chart may be used to obtain estimates of the parameters in the (1 ,d, 1) model 
(1 — 4>B)w l = (1 — 9B)a t , where w t — V d z t , by substituting and r 2 (w) for p x 

and p 2 . 


TABLE E Tail Areas and Ordinates of Unit Normal Distribution 0 


U E 

£ 

P(u e ) 


£ 

P(u c ) 

0.0 

0.500 

0.3989 

1.6 

0.055 

0.1109 

0.1 

0.460 

0.3969 

1.7 

0.045 

0.0940 

0.2 

0.421 

0.3910 

1.8 

0.036 

0.0790 

0.3 

0.382 

0.3814 

1.9 

0.029 

0.0656 

0.4 

0.345 

0.3683 

2.0 

0.023 

0.0540 

0.5 

0.309 

0.3521 

2.1 

0.018 

0.0440 

0.6 

0.274 

0.3322 

2.2 

0.014 

0.0355 

0.7 

0.242 

0.3123 

2.3 

0011 

0.0283 

0.8 

0.212 

0.2897 

2.4 

0.008 

0.0224 

0.9 

0.184 

0.2661 

2.5 

0.006 

0.0175 

1.0 

0.159 

0.2420 

2.6 

0.005 

0.0136 

1.1 

0.136 

0.2179 

2.7 

0.003 

0.0104 

1.2 

0.115 

0.1942 

2.8 

0.003 

0.0079 

1.3 

0.097 

0.1714 

2.9 

0.002 

0.0059 

1.4 

0.081 

0.1497 

3.0 

0.001 

0.0044 

1.5 

0.067 

0.1295 





a Shown are the values of the unit normal deviate u £ such that Pr{« > u £ ) = e\ also shown are the ordinates 
p(u = u £ ). 










































































TABLE F Tail Areas of the Chi-Square Distribution 0 


m 







£ 








m 

0.995 

0.99 

0.975 

0.95 

0.9 

0.75 

0.5 

0.25 

0.1 

0.05 

0.025 

0.01 

0.005 

0.001 

1 

_ 

_ 

_ 

_ 

0.016 

0.102 

0.455 

1.32 

2.71 

3.84 

5.02 

6.63 

7.88 

10.8 

1 

2 

0.010 

0.020 

0.051 

0.103 

0.211 

0.575 

1.39 

2.77 

4.61 

5.99 

7.38 

9.21 

10.6 

13.8 

2 

3 

0.072 

0.115 

0.216 

0.352 

0.584 

1.21 

2.37 

4.11 

6.25 

7.81 

9.35 

11.3 

12.8 

16.3 

3 

4 

0.207 

0.297 

0.484 

0.711 

1.06 

1.92 

3.36 

5.39 

7.78 

9.49 

11.1 

133 

14.9 

18.5 

4 

5 

0.412 

0.554 

0.831 

1.15 

1.61 

2.67 

4.35 

6.63 

9.24 

11.1 

12.8 

15.1 

16.7 

20.5 

5 

6 

0.676 

0.872 

1.24 

1.64 

2.20 

3.45 

5.35 

7.84 

10.6 

12.6 

14.4 

16.8 

18.5 

22.5 

6 

7 

0.989 

1.24 

1.69 

2.17 

2.83 

4.25 

6.35 

9.04 

12.0 

14.1 

16.0 

18.5 

20.3 

24.3 

7 

8 

1.34 

1.65 

2.18 

2.73 

3.49 

5.07 

7.34 

10.2 

13.4 

15.5 

17.5 

20.1 

22.0 

26.1 

8 

9 

1.73 

2.09 

2.70 

3.33 

4.17 

5.90 

8.34 

11.4 

14.7 

16.9 

19.0 

21.7 

23.6 

27.9 

9 

10 

2.16 

2.56 

3.25 

3.94 

4.87 

6.74 

9.34 

12.5 

16.0 

18.3 

20.5 

23.2 

25.2 

29.6 

10 

11 

2.60 

3.05 

3.82 

4.57 

5.58 

7.58 

10.3 

13.7 

17.3 

19.7 

21.9 

24.7 

26.8 

31.3 

11 

12 

3.07 

3.57 

4.40 

5.23 

6.30 

8.44 

11.3 

14.8 

18.5 

21.0 

23.3 

26.2 

28.3 

32.9 

12 

13 

3.57 

4.11 

5.01 

5.89 

7.04 

9.30 

12.3 

16.0 

19.8 

22.4 

24.7 

27.7 

29.8 

34.5 

13 

14 

4.07 

4.66 

5.63 

6.57 

7.79 

10.2 

13.3 

17.1 

21.1 

23.7 

26.1 

29.1 

31.3 

36.1 

14 

15 

4.60 

5.23 

6.26 

7.26 

8.55 

11.0 

14.3 

18.2 

22.3 

25.0 

27.5 

30.6 

32.8 

37.7 

15 

16 

5.14 

5.81 

6.91 

7.96 

9.31 

11.9 

15.3 

19.4 

23.5 

26.3 

28.8 

32.0 

34.3 

39.3 

16 

17 

5.70 

6.41 

7.56 

8.67 

10.1 

12.8 

16.3 

20.5 

24.8 

27.6 

30.2 

33.4 

35.7 

40.8 

17 

18 

6.26 

7.01 

8.23 

9.39 

10.9 

13.7 

17.3 

21.6 

26.0 

28.9 

31.5 

34.8 

37.2 

42.3 

18 

19 

6.84 

7.63 

8.91 

10.1 

11.7 

14.6 

18.3 

22.7 

27.2 

30.1 

32.9 

36.2 

38.6 

43.8 

19 

20 

7.43 

8.26 

9.59 

10.9 

12.4 

15.5 

19.3 

23.8 

28.4 

31.4 

34.2 

37.6 

40.0 

45.3 

20 

21 

8.03 

8.90 

10.3 

11.6 

13.2 

16.3 

20.3 

24.9 

29.6 

32.7 

35.5 

38.9 

41.4 

46.8 

21 

22 

8.64 

9.54 

11.0 

12.3 

14.0 

17.2 

21.3 

26.0 

30.8 

33.9 

36.8 

40.3 

42.8 

48.3 

22 

23 

9.26 

10.2 

11.7 

13.1 

14.8 

18.1 

22.3 

27.1 

32.0 

35.2 

38.1 

41.6 

44.2 

49.7 

23 

24 

9.89 

10.9 

12.4 

13.8 

15.7 

19.0 

23.3 

28.2 

33.2 

36.4 

39.4 

43.0 

45.6 

51.2 

24 

25 

10.5 

11.5 

13.1 

14.6 

16.5 

19.9 

24.3 

29.3 

34.4 

37.7 

40.6 

44.3 

46.9 

52.6 

25 

26 

11.2 

12.2 

13.8 

15.4 

17.3 

20.8 

25.3 

30.4 

35.6 

38.9 

41.9 

45.6 

48.3 

54.1 

26 

27 

11.8 

12.9 

14.6 

16.2 

18.1 

21.7 

26.3 

31.5 

36.7 

40.1 

43.2 

47.0 

49.6 

55.5 

27 

28 

12.5 

13.6 

15.3 

16.9 

18.9 

22.7 

27.3 

32.6 

37.9 

41.3 

44.5 

48.3 

51.0 

56.9 

28 

29 

13.1 

14.3 

16.0 

17.7 

19.8 

23.6 

28.3 

33.7 

39.1 

42.6 

45.7 

49.6 

52.3 

58.3 

29 

30 

13.8 

15.0 

16.8 

18.5 

20.6 

24.5 

29.3 

34.8 

40.3 

43.8 

47.0 

50.9 

53.7 

59.7 

30 


“Shown are the values of x~(m) such that Pr{x*(m) > x^(m)} = e. where m is the number of degrees of freedom. 
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624 COLLECTION OF TABLES AND CHARTS 


TABLE G Tail Areas of the t Distribution" 


nu 




£ 



0.25 

0.10 

0.05 

0.025 

0.01 

0.005 

i 

1.00 

3.08 

6.31 

12.71 

31.82 

63.66 

2 

0.82 

1.89 

2.92 

4.30 

6.96 

9.92 

3 

0.76 

1.64 

2.35 

3.18 

4.54 

5.84 

4 

0.74 

1.53 

2.13 

2.78 

3.75 

4.60 

5 

0.73 

1.48 

2.02 

2.57 

3.36 

4.03 

6 

0.72 

1.44 

1.94 

2.45 

3.14 

3.71 

7 

0.71 

1.42 

1.90 

2.36 

3.00 

3.50 

8 

0.71 

1.40 

1.86 

2.31 

2.90 

3.36 

9 

0.70 

1.38 

1.83 

2.26 

2.82 

3.25 

10 

0.70 

1.37 

1.81 

2.23 

2.76 

3.17 

11 

0.70 

1.36 

1.80 

2.20 

2.72 

3.11 

12 

0.70 

1.36 

1.78 

2.18 

2.68 

3.06 

13 

0.69 

1.35 

1.77 

2.16 

2.65 

3.01 

14 

0.69 

1.34 

1.76 

2.14 

2.62 

2.98 

15 

0.69 

1.34 

1.75 

2.13 

2.60 

2.95 

16 

0.69 

1.34 

1.75 

2.12 

2.58 

2.92 

17 

0.69 

1.33 

1.74 

2.11 

2.57 

2.90 

18 

0.69 

1.33 

1.73 

2.10 

2.55 

2.88 

19 

0.69 

1.33 

1.73 

2.09 

2.54 

2.86 

20 

0.69 

1.33 

1.72 

2.09 

2.53 

2.84 

30 

0.68 

1.31 

1.70 

2.04 

2.46 

2.75 

40 

0.68 

1.30 

1.68 

2.02 

2.42 

2.70 

60 

0.68 

1.30 

1.67 

2.00 

2.39 

2.66 

120 

0.68 

1.29 

1.66 

1.98 

2.36 

2.62 

00 

0.67 

1.28 

1.64 

1.96 

2.33 

2.58 


0 Shown are the values of t f ( v) such that Pr { t(i;) > t t (/;)} = e, where v is the number of degrees of freedom. 





COLLECTION OF TIME SERIES USED 
FOR EXAMPLES IN THE TEXT AND IN 
EXERCISES 


SERIES A 
SERIES B 
SERIES IV 
SERIES C 
SERIES D 
SERIES E 
SERIES F 
SERIES G 

SERIES J 
SERIES K 
SERIES L 
SERIES M 
SERIES N 
SERIES P 
SERIES Q 
SERIES R 


Chemical process concentration readings: every 2 hours 

IBM common stock closing prices: daily, May 17, 1961-November 2, 1962 

IBM common stock closing prices: daily, June 29, 1959-June 30, 1960 

Chemical process temperature readings: every minute 

Chemical process viscosity readings: every hour 

Wolfer sunspot numbers: yearly 

Yields from a batch chemical process: consecutive 

International airline passengers: monthly totals (thousands of passengers) 
January 1949-December 1960 

Gas furnace data 

Simulated dynamic data with two inputs 

Pilot scheme data 

Sales data with leading indicator 

Mink fur sales of the Hudson’s Bay Company: annual for 1850-1911 
Unemployment and GDP data in UK: quarterly for 1955-1969 
Logged and coded U.S. hog price data: annual for 1867-1948 
Monthly averages of hourly readings of ozone in downtown Los Angeles 


Time Series Analysis: Forecasting and Control, Fifth Edition. George E. P. Box, Gwilym M. Jenkins, 
Gregory C. Reinsel, and Greta M. Ljung 

©2016 John Wiley & Sons. Inc. Published 2016 by John Wiley & Sons. Inc. 
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626 COLLECTION OF TIME SERIES USED FOR EXAMPLES IN THE TEXT AND IN EXERCISES 


SERIES A Chemical Process Concentration Readings: Every 2 Hours" 


1 

17.0 

41 

17.6 

81 

16.8 

121 

16.9 

161 

17.1 

2 

16.6 

42 

17.5 

82 

16.7 

122 

17.1 

162 

17.1 

3 

16.3 

43 

16.5 

83 

16.4 

123 

16.8 

163 

17.1 

4 

16.1 

44 

17.8 

84 

16.5 

124 

17.0 

164 

17.4 

5 

17.1 

45 

17.3 

85 

16.4 

125 

17.2 

165 

17.2 

6 

16.9 

46 

17.3 

86 

16.6 

126 

17.3 

166 

16.9 

7 

16.8 

47 

17.1 

87 

16.5 

127 

17.2 

167 

16.9 

8 

17.4 

48 

17.4 

88 

16.7 

128 

17.3 

168 

17.0 

9 

17.1 

49 

16.9 

89 

16.4 

129 

17.2 

169 

16.7 

10 

17.0 

50 

17.3 

90 

16.4 

130 

17.2 

170 

16.9 

11 

16.7 

51 

17.6 

91 

16.2 

131 

17.5 

171 

17.3 

12 

17.4 

52 

16.9 

92 

16.4 

132 

16.9 

172 

17.8 

13 

17.2 

53 

16.7 

93 

16.3 

133 

16.9 

173 

17.8 

14 

17.4 

54 

16.8 

94 

16.4 

134 

16.9 

174 

17.6 

15 

17.4 

55 

16.8 

95 

17.0 

135 

17.0 

175 

17.5 

16 

17.0 

56 

17.2 

96 

16.9 

136 

16.5 

176 

17.0 

17 

17.3 

57 

16.8 

97 

17.1 

137 

16.7 

177 

16.9 

18 

17.2 

58 

17.6 

98 

17.1 

138 

16.8 

178 

17.1 

19 

17.4 

59 

17.2 

99 

16.7 

139 

16.7 

179 

17.2 

20 

16.8 

60 

16.6 

100 

16.9 

140 

16.7 

180 

17.4 

21 

17.1 

61 

17.1 

101 

16.5 

141 

16.6 

181 

17.5 

22 

17.4 

62 

16.9 

102 

17.2 

142 

16.5 

182 

17.9 

23 

17.4 

63 

16.6 

103 

16.4 

143 

17.0 

183 

17.0 

24 

17.5 

64 

18.0 

104 

17.0 

144 

16.7 

184 

17.0 

25 

17.4 

65 

17.2 

105 

17.0 

145 

16.7 

185 

17.0 

26 

17.6 

66 

17.3 

106 

16.7 

146 

16.9 

186 

17.2 

27 

17.4 

67 

17.0 

107 

16.2 

147 

17.4 

187 

17.3 

28 

17.3 

68 

16.9 

108 

16.6 

148 

17.1 

188 

17.4 

29 

17.0 

69 

17.3 

109 

16.9 

149 

17.0 

189 

17.4 

30 

17.8 

70 

16.8 

110 

16.5 

150 

16.8 

190 

17.0 

31 

17.5 

71 

17.3 

111 

16.6 

151 

17.2 

191 

18.0 

32 

18.1 

72 

17.4 

112 

16.6 

152 

17.2 

192 

18.2 

33 

17.5 

73 

17.7 

113 

17.0 

153 

17.4 

193 

17.6 

34 

17.4 

74 

16.8 

114 

17.1 

154 

17.2 

194 

17.8 

35 

17.4 

75 

16.9 

115 

17.1 

155 

16.9 

195 

17.7 

36 

17.1 

76 

17.0 

116 

16.7 

156 

16.8 

196 

17.2 

37 

17.6 

77 

16.9 

117 

16.8 

157 

17.0 

197 

17.4 

38 

17.7 

78 

17.0 

118 

16.3 

158 

17.4 



39 

17.4 

79 

16.6 

119 

16.6 

159 

17.2 



40 

17.8 

80 

16.7 

120 

16.8 

160 

17.2 




197 observations. 





COLLECTION OF TIME SERIES USED FOR EXAMPLES IN THE TEXT AND IN EXERCISES 627 


SERIES B IBM Common Stock Closing Prices: Daily, May 17,1961-November 2,1962“ 


460 

471 

527 

580 

551 

523 

333 

394 

330 

457 

467 

540 

579 

551 

516 

330 

393 

340 

452 

473 

542 

584 

552 

511 

336 

409 

339 

459 

481 

538 

581 

553 

518 

328 

411 

331 

462 

488 

541 

581 

557 

517 

316 

409 

345 

459 

490 

541 

577 

557 

520 

320 

408 

352 

463 

489 

547 

577 

548 

519 

332 

393 

346 

479 

489 

553 

578 

547 

519 

320 

391 

352 

493 

485 

559 

580 

545 

519 

333 

388 

357 

490 

491 

557 

586 

545 

518 

344 

396 


492 

492 

557 

583 

539 

513 

339 

387 


498 

494 

560 

581 

539 

499 

350 

383 


499 

499 

571 

576 

535 

485 

351 

388 


497 

498 

571 

571 

537 

454 

350 

382 


496 

500 

569 

575 

535 

462 

345 

384 


490 

497 

575 

575 

536 

473 

350 

382 


489 

494 

580 

573 

537 

482 

359 

383 


478 

495 

584 

577 

543 

486 

375 

383 


487 

500 

585 

582 

548 

475 

379 

388 


491 

504 

590 

584 

546 

459 

376 

395 


487 

513 

599 

579 

547 

451 

382 

392 


482 

511 

603 

572 

548 

453 

370 

386 


479 

514 

599 

577 

549 

446 

365 

383 


478 

510 

596 

571 

553 

455 

367 

377 


479 

509 

585 

560 

553 

452 

372 

364 


477 

515 

587 

549 

552 

457 

373 

369 


479 

519 

585 

556 

551 

449 

363 

355 


475 

523 

581 

557 

550 

450 

371 

350 


479 

519 

583 

563 

553 

435 

369 

353 


476 

523 

592 

564 

554 

415 

376 

340 


476 

531 

592 

567 

551 

398 

387 

350 


478 

547 

596 

561 

551 

399 

387 

349 


479 

551 

596 

559 

545 

361 

376 

358 


477 

547 

595 

553 

547 

383 

385 

360 


476 

541 

598 

553 

547 

393 

385 

360 


475 

545 

598 

553 

537 

385 

380 

366 


475 

549 

595 

547 

539 

360 

373 

359 


473 

545 

595 

550 

538 

364 

382 

356 


474 

549 

592 

544 

533 

365 

377 

355 


474 

547 

588 

541 

525 

370 

376 

367 


474 

543 

582 

532 

513 

374 

379 

357 


465 

540 

576 

525 

510 

359 

386 

361 


466 

539 

578 

542 

521 

335 

387 

355 


467 

532 

589 

555 

521 

323 

386 

348 


471 

517 

585 

558 

521 

306 

389 

343 



369 observations (read down). 




628 COLLECTION OF TIME SERIES USED FOR EXAMPLES IN THE TEXT AND IN EXERCISES 


SERIES B' IBM Common Stock Closing Prices: Daily, June 29,1959-June 30,1960“ 


445 

425 


441 

415 

461 

448 

421 

407 

437 

420 

463 

450 

414 

410 

427 

420 

463 

447 

410 

408 

423 

424 

461 

451 

411 

408 

424 

426 

465 

453 

406 

409 

428 

423 

473 

454 

406 

410 

428 

423 

473 

454 

413 

409 

431 

425 

475 

459 

411 

405 

425 

431 

499 

440 

410 

406 

423 

436 

485 

446 

405 

405 

420 

436 

491 

443 

409 

407 

426 

440 

496 

443 

410 

409 

418 

436 

504 

440 

405 

407 

416 

443 

504 

439 

401 

409 

419 

445 

509 

435 

401 

425 

418 

439 

511 

435 

401 

425 

416 

443 

524 

436 

414 

428 

419 

445 

525 

435 

419 

436 

425 

450 

541 

435 

425 

442 

421 

461 

531 

435 

423 

442 

422 

471 

529 

433 

411 

433 

422 

467 

530 

429 

414 

435 

417 

462 

531 

428 

420 

433 

420 

456 

527 

425 

412 

435 

417 

464 

525 

427 

415 

429 

418 

463 

519 

425 

412 

439 

419 

465 

514 

422 

412 

437 

419 

464 

509 

409 

411 

439 

417 

456 

505 

407 

412 

438 

419 

460 

513 

423 

409 

435 

422 

458 

525 

422 

407 

433 

423 

453 

519 

417 

408 

437 

422 

453 

519 

421 

415 

437 

421 

449 

522 

424 

413 

444 

421 

447 

522 

414 

413 

441 

419 

453 


419 

410 

440 

418 

450 


429 

405 

441 

421 

459 


426 

410 

439 

420 

457 


425 

412 

439 

413 

453 


424 

413 

438 

413 

455 


425 

411 

437 

408 

453 


425 

411 

441 

409 

450 


424 

409 

442 

415 

456 



255 observations (read down). 




COLLECTION OF TIME SERIES USED FOR EXAMPLES IN THE TEXT AND IN EXERCISES 629 


SERIES C Chemical Process Temperature Readings: Every Minute" 


26.6 

19.6 

24.4 

21.1 

24.4 

27.0 

19.6 

24.4 

20.9 

24.2 

27.1 

19.6 

24.4 

20.8 

24.2 

27.1 

19.6 

24.4 

20.8 

24.1 

27.1 

19.6 

24.5 

20.8 

24.1 

27.1 

19.7 

24.5 

20.8 

24.0 

26.9 

19.9 

24.4 

20.9 

24.0 

26.8 

20.0 

24.3 

20.8 

24.0 

26.7 

20.1 

24.2 

20.8 

23.9 

26.4 

20.2 

24.2 

20.7 

23.8 

26.0 

20.3 

24.0 

20.7 

23.8 

25.8 

20.6 

23.9 

20.8 

23.7 

25.6 

21.6 

23.7 

20.9 

23.7 

25.2 

21.9 

23.6 

21.2 

23.6 

25.0 

21.7 

23.5 

21.4 

23.7 

24.6 

21.3 

23.5 

21.7 

23.6 

24.2 

21.2 

23.5 

21.8 

23.6 

24.0 

21.4 

23.5 

21.9 

23.6 

23.7 

21.7 

23.5 

22.2 

23.5 

23.4 

22.2 

23.7 

22.5 

23.5 

23.1 

23.0 

23.8 

22.8 

23.4 

22.9 

23.8 

23.8 

23.1 

23.3 

22.8 

24.6 

23.9 

23.4 

23.3 

22.7 

25.1 

23.9 

23.8 

23.3 

22.6 

25.6 

23.8 

24.1 

23.4 

22.4 

25.8 

23.7 

24.6 

23.4 

22.2 

26.1 

23.6 

24.9 

23.3 

22.0 

26.3 

23.4 

24.9 

23.2 

21.8 

26.3 

23.2 

25.1 

23.3 

21.4 

26.2 

23.0 

25.0 

23.3 

20.9 

26.0 

22.8 

25.0 

23.2 

20.3 

25.8 

22.6 

25.0 

23.1 

19.7 

25.6 

22.4 

25.0 

22.9 

19.4 

25.4 

22.0 

24.9 

22.8 

19.3 

25.2 

21.6 

24.8 

22.6 

19.2 

24.9 

21.3 

24.7 

22.4 

19.1 

24.7 

21.2 

24.6 

22.2 

19.0 

24.5 

21.2 

24.5 

21.8 

18.9 

24.4 

21.1 

24.5 

21.3 

18.9 

24.4 

21.0 

24.5 

20.8 

19.2 

24.4 

20.9 

24.5 

20.2 

19.3 

24.4 

21.0 

24.5 

19.7 

19.3 

24.4 

21.0 

24.5 

19.3 

19.4 

24.3 

21.1 

24.5 

19.1 

19.5 

24.4 

21.2 

24.4 

19.0 


18.8 


226 observations (read down). 
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SERIES D Chemical Process Viscosity Readings: Every Hour" 


8.0 

8.8 

9.3 

9.1 

9.0 

10.0 

9.6 

8.0 

8.6 

9.9 

9.5 

9.0 

9.8 

8.6 

7.4 

8.6 

9.7 

9.4 

9.4 

9.8 

8.0 

8.0 

8.4 

9.1 

9.5 

9.0 

9.7 

8.0 

8.0 

8.3 

9.3 

9.6 

9.0 

9.6 

8.0 

8.0 

8.4 

9.5 

10.2 

9.4 

9.4 

8.0 

8.0 

8.3 

9.4 

9.8 

9.4 

9.2 

8.4 

8.8 

8.3 

9.0 

9.6 

9.6 

9.0 

8.8 

8.4 

8.1 

9.0 

9.6 

9.4 

9.4 

8.4 

8.4 

8.2 

8.8 

9.4 

9.6 

9.6 

8.4 

8.0 

8.3 

9.0 

9.4 

9.6 

9.6 

9.0 

8.2 

8.5 

8.8 

9.4 

9.6 

9.6 

9.0 

8.2 

8.1 

8.6 

9.4 

10.0 

9.6 

9.4 

8.2 

8.1 

8.6 

9.6 

10.0 

9.6 

10.0 

8.4 

7.9 

8.0 

9.6 

9.6 

9.6 

10.0 

8.4 

8.3 

8.0 

9.4 

9.2 

9.0 

10.0 

8.4 

8.1 

8.0 

9.4 

9.2 

9.4 

10.2 

8.6 

8.1 

8.0 

9.0 

9.2 

9.4 

10.0 

8.8 

8.1 

8.6 

9.4 

9.0 

9.4 

10.0 

8.6 

8.4 

8.0 

9.4 

9.0 

9.6 

9.6 

8.6 

8.7 

8.0 

9.6 

9.6 

9.4 

9.0 

8.6 

9.0 

8.0 

9.4 

9.8 

9.6 

9.0 

8.6 

9.3 

7.6 

9.2 

10.2 

9.6 

8.6 

8.6 

9.3 

8.6 

8.8 

10.0 

9.8 

9.0 

8.8 

9.5 

9.6 

8.8 

10.0 

9.8 

9.6 

8.9 

9.3 

9.6 

9.2 

10.0 

9.8 

9.6 

9.1 

9.5 

10.0 

9.2 

9.4 

9.6 

9.0 

9.5 

9.5 

9.4 

9.6 

9.2 

9.2 

9.0 

8.5 

9.5 

9.3 

9.6 

9.6 

9.6 

8.9 

8.4 

9.5 

9.2 

9.8 

9.7 

9.2 

8.8 

8.3 

9.5 

9.5 

9.8 

9.7 

9.2 

8.7 

8.2 

9.5 

9.5 

10.0 

9.8 

9.6 

8.6 

8.1 

9.9 

9.5 

10.0 

9.8 

9.6 

8.3 

8.3 

9.5 

9.9 

9.4 

9.8 

9.6 

7.9 

8.4 

9.7 

9.9 

9.8 

10.0 

9.6 

8.5 

8.7 

9.1 

9.5 

8.8 

10.0 

9.6 

8.7 

8.8 

9.1 

9.3 

8.8 

8.6 

9.6 

8.9 

8.8 

8.9 

9.5 

8.8 

9.0 

10.0 

9.1 

9.2 

9.3 

9.5 

8.8 

9.4 

10.0 

9.1 

9.6 

9.1 

9.1 

9.6 

9.4 

10.4 

9.1 

9.0 

9.1 

9.3 

9.6 

9.4 

10.4 


8.8 

9.3 

9.5 

9.6 

9.4 

9.8 


8.6 

9.5 

9.3 

9.2 

9.4 

9.0 


8.6 

9.3 

9.1 

9.2 

9.6 

9.6 


8.8 

9.3 

9.3 

9.0 

10.0 

9.8 



'310 observations (read down). 
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SERIES E Wolfer Sunspot Numbers: Yearly" 


1770 

101 

1795 

21 

■Eg|g|iH 

16 

■nga 

40 

1771 

82 

1796 

16 


7 


62 

1772 

66 

1797 

6 


4 


98 

1773 

35 

1798 

4 


2 


124 

1774 

31 

1799 

7 


8 


96 

1775 

7 

1800 

14 

1825 

17 


66 

1776 

20 

1801 

34 

1826 

36 


64 

1777 

92 

1802 

45 

1827 

50 

1852 

54 

1778 

154 

1803 

43 

1828 

62 

1853 

39 

1779 

125 

1804 

48 

1829 

67 

1854 

21 

1780 

85 

1805 

42 

1830 

71 

1855 

7 

1781 

68 

1806 

28 

1831 

48 

1856 

4 

1782 

38 

1807 

10 

1832 

28 

1857 

23 

1783 

23 

1808 

8 

1833 

8 

1858 

55 

1784 

10 

1809 

2 

1834 

13 

1859 

94 

1785 

24 

1810 

0 

1835 

57 

1860 

96 

1786 

83 

1811 

1 

1836 

122 

1861 

77 

1787 

132 

1812 

5 

1837 

138 

1862 

59 

1788 

131 

1813 

12 

1838 

103 

1863 

44 

1789 

118 

1814 

14 

1839 

86 

1864 

47 

1790 

90 

1815 

35 

1840 

63 

1865 

30 

1791 

67 

1816 

46 

1841 

37 

1866 

16 

1792 

60 

1817 

41 

1842 

24 

1867 

7 

1793 

47 

1818 

30 

1843 

11 

1868 

37 

1794 

41 

1819 

24 

1844 

15 

1869 

74 


"100 observations. 


SERIES F Yields from a Batch Chemical Process: Consecutive" 


47 

44 

50 

62 

68 

64 

80 

71 

44 

38 

23 

55 

56 

64 

50 

71 

37 

74 

43 

60 

38 

74 

50 

52 

39 

64 

51 

58 

38 

59 

55 

57 

45 

59 

40 

41 

50 

54 

55 

57 

59 

60 

36 

41 

54 

48 

45 

54 

53 

23 

71 

57 

48 

49 


35 

50 

55 

34 


57 

45 

45 

35 


40 

25 

57 

54 


58 

59 

50 

45 



! 70 Observations (read down). 
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SERIES G International Airline Passengers: Monthly Totals (Thousands of Passengers) 
January 1949-December 1960“ 



Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1949 

112 

118 

132 

129 

121 

135 

148 

148 

136 

119 

104 

118 

1950 

115 

126 

141 

135 

125 

149 

170 

170 

158 

133 

114 

140 

1951 

145 

150 

178 

163 

172 

178 

199 

199 

184 

162 

146 

166 

1952 

171 

180 

193 

181 

183 

218 

230 

242 

209 

191 

172 

194 

1953 

196 

196 

236 

235 

229 

243 

264 

272 

237 

211 

180 

201 

1954 

204 

188 

235 

227 

234 

264 

302 

293 

259 

229 

203 

229 

1955 

242 

233 

267 

269 

270 

315 

364 

347 

312 

274 

237 

278 

1956 

284 

277 

317 

313 

318 

374 

413 

405 

355 

306 

271 

306 

1957 

315 

301 

356 

348 

355 

422 

465 

467 

404 

347 

305 

336 

1958 

340 

318 

362 

348 

363 

435 

491 

505 

404 

359 

310 

337 

1959 

360 

342 

406 

396 

420 

472 

548 

559 

463 

407 

362 

405 

1960 

417 

391 

419 

461 

472 

535 

622 

606 

508 

461 

390 

432 


“144 observations. 


SERIES J Series J Gas Furnace Data“ 


t 

x, 

Y, 

t 


Y, 

t 

x, 

Y, 

1 

-0.109 

53.8 

51 

1.608 

46.9 

101 

-0.288 

51.0 

2 

0.000 

53.6 

52 

1.905 

47.8 

102 

-0.153 

51.8 

3 

0.178 

53.5 

53 

2.023 

48.2 

103 

-0.109 

52.4 

4 

0.339 

53.5 

54 

1.815 

48.3 

104 

-0.187 

53.0 

5 

0.373 

53.4 

55 

0.535 

47.9 

105 

-0.255 

53.4 

6 

0.441 

53.1 

56 

0.122 

47.2 

106 

-0.229 

53.6 

7 

0.461 

52.7 

57 

0.009 

47.2 

107 

-0.007 

53.7 

8 

0.348 

52.4 

58 

0.164 

48.1 

108 

0.254 

53.8 

9 

0.127 

52.2 

59 

0.671 

49.4 

109 

0.330 

53.8 

10 

-0.180 

52.0 

60 

1.019 

50.6 

110 

0.102 

53.8 

11 

-0.588 

52.0 

61 

1.146 

51.5 

111 

-0.423 

53.3 

12 

-1.055 

52.4 

62 

1.155 

51.6 

112 

-1.139 

53.0 

13 

-1.421 

53.0 

63 

1.112 

51.2 

113 

-2.275 

52.9 

14 

-1.520 

54.0 

64 

1.121 

50.5 

114 

-2.594 

53.4 

15 

-1.302 

54.9 

65 

1.223 

50.1 

115 

-2.716 

54.6 

16 

-0.814 

56.0 

66 

1.257 

49.8 

116 

-2.510 

56.4 

17 

-0.475 

56.8 

67 

1.157 

49.6 

117 

-1.790 

58.0 

18 

-0.193 

56.8 

68 

0.913 

49.4 

118 

-1.346 

59.4 

19 

0.088 

56.4 

69 

0.620 

49.3 

119 

-1.081 

60.2 

20 

0.435 

55.7 

70 

0.255 

49.2 

120 

-0.910 

60.0 

21 

0.771 

55.0 

71 

-0.280 

49.3 

121 

-0.876 

59.4 

22 

0.866 

54.3 

72 

-1.080 

49.7 

122 

-0.885 

58.4 

23 

0.875 

53.2 

73 

-1.551 

50.3 

123 

-0.800 

57.6 

24 

0.891 

52.3 

74 

-1.799 

51.3 

124 

-0.544 

56.9 

25 

0.987 

51.6 

75 

-1.825 

52.8 

125 

-0.416 

56.4 

26 

1.263 

51.2 

76 

-1.456 

54.4 

126 

-0.271 

56.0 

27 

1.775 

50.8 

77 

-0.944 

56.0 

127 

0.000 

55.7 

28 

1.976 

50.5 

78 

-0.570 

56.9 

128 

0.403 

55.3 

29 

1.934 

50.0 

79 

-0.431 

57.5 

129 

0.841 

55.0 
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SERIES J (continued ) 


t 

x, 

y, 

t 

x, 

Y, 

t 

x, 

Y, 

30 

1.866 

49.2 

80 

-0.577 

57.3 

130 

1.285 

54.4 

31 

1.832 

48.4 

81 

-0.960 

56.6 

131 

1.607 

53.7 

32 

1.767 

47.9 

82 

-1.616 

56.0 

132 

1.746 

52.8 

33 

1.608 

47.6 

83 

-1.875 

55.4 

133 

1.683 

51.6 

34 

1.265 

47.5 

84 

-1.891 

55.4 

134 

1.485 

50.6 

35 

0.790 

47.5 

85 

-1.746 

56.4 

135 

0.993 

49.4 

36 

0.360 

47.6 

86 

-1.474 

57.2 

136 

0.648 

48.8 

37 

0.115 

48.1 

87 

-1.201 

58.0 

137 

0.577 

48.5 

38 

0.088 

49.0 

88 

-0.927 

58.4 

138 

0.577 

48.7 

39 

0.331 

50.0 

89 

-0.524 

58.4 

139 

0.632 

49.2 

40 

0.645 

51.1 

90 

0.040 

58.1 

140 

0.747 

49.8 

41 

0.960 

51.8 

91 

0.788 

57.7 

141 

0.900 

50.4 

42 

1.409 

51.9 

92 

0.943 

57.0 

142 

0.993 

50.7 

43 

2.670 

51.7 

93 

0.930 

56.0 

143 

0.968 

50.9 

44 

2.834 

51.2 

94 

1.006 

54.7 

144 

0.790 

50.7 

45 

2.812 

50.0 

95 

1.137 

53.2 

145 

0.399 

50.5 

46 

2.483 

48.3 

96 

1.198 

52.1 

146 

-0.161 

50.4 

47 

1.929 

47.0 

97 

1.054 

51.6 

147 

-0.553 

50.2 

48 

1.485 

45.8 

98 

0.595 

51.0 

148 

-0.603 

50.4 

49 

1.214 

45.6 

99 

-0.080 

50.5 

149 

-0.424 

51.2 

50 

1.239 

46.0 

100 

-0.314 

50.4 

150 

-0.194 

52.3 

151 

-0.049 

53.2 

201 

-2.473 

55.6 

251 

0.185 

56.3 

152 

0.060 

53.9 

202 

-2.330 

58.0 

252 

0.662 

56.4 

153 

0.161 

54.1 

203 

-2.053 

59.5 

253 

0.709 

56.4 

154 

0.301 

54.0 

204 

-1.739 

60.0 

254 

0.605 

56.0 

155 

0.517 

53.6 

205 

-1.261 

60.4 

255 

0.501 

55.2 

156 

0.566 

53.2 

206 

-0.569 

60.5 

256 

0.603 

54.0 

157 

0.560 

53.0 

207 

-0.137 

60.2 

257 

0.943 

53.0 

158 

0.573 

52.8 

208 

-0.024 

59.7 

258 

1.223 

52.0 

159 

0.592 

52.3 

209 

-0.050 

59.0 

259 

1.249 

51.6 

160 

0.671 

51.9 

210 

-0.135 

57.6 

260 

0.824 

51.6 

161 

0.933 

51.6 

211 

-0.276 

56.4 

261 

0.102 

51.1 

162 

1.337 

51.6 

212 

-0.534 

55.2 

262 

0.025 

50.4 

163 

1.460 

51.4 

213 

-0.871 

54.5 

263 

0.382 

50.0 

164 

1.353 

51.2 

214 

-1.243 

54.1 

264 

0.922 

50.0 

165 

0.772 

50.7 

215 

-1.439 

54.1 

265 

1.032 

52.0 

166 

0.218 

50.0 

216 

-1.422 

54.4 

266 

0.866 

54.0 

167 

-0.237 

49.4 

217 

-1.175 

55.5 

267 

0.527 

55.1 

168 

-0.714 

49.3 

218 

-0.813 

56.2 

268 

0.093 

54.5 

169 

-1.099 

49.7 

219 

-0.634 

57.0 

269 

-0.458 

52.8 

170 

-1.269 

50.6 

220 

-0.582 

57.3 

270 

-0.748 

51.4 

171 

-1.175 

51.8 

221 

-0.625 

57.4 

271 

-0.947 

50.8 

172 

-0.676 

53.0 

222 

-0.713 

57.0 

272 

-1.029 

51.2 

173 

0.033 

54.0 

223 

-0.848 

56.4 

273 

-0.928 

52.0 

174 

0.556 

55.3 

224 

-1.039 

55.9 

274 

-0.645 

52.8 

175 

0.643 

55.9 

225 

-1.346 

55.5 

275 

-0.424 

53.8 

176 

0.484 

55.9 

226 

-1.628 

55.3 

276 

-0.276 

54.5 

177 

0.109 

54.6 

227 

-1.619 

55.2 

277 

-0.158 

54.9 

178 

-0.310 

53.5 

228 

-1.149 

55.4 

278 

-0.033 

54.9 
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SERIES J (i continued) 


t 


Y, 

t 

x, 

Y, 

t 

X, 

Y, 

179 

-0.697 

52.4 

229 

-0.488 

56.0 

279 

0.102 

54.8 

180 

-1.047 

52.1 

230 

-0.160 

56.5 

280 

0.251 

54.4 

181 

-1.218 

52.3 

231 

-0.007 

57.1 

281 

0.280 

53.7 

182 

-1.183 

53.0 

232 

-0.092 

57.3 

282 

0.000 

53.3 

183 

-0.873 

53.8 

233 

-0.620 

56.8 

283 

-0.493 

52.8 

184 

-0.336 

54.6 

234 

-1.086 

55.6 

284 

-0.759 

52.6 

185 

0.063 

55.4 

235 

-1.525 

55.0 

285 

-0.824 

52.6 

186 

0.084 

55.9 

236 

-1.858 

54.1 

286 

-0.740 

53.0 

187 

0.000 

55.9 

237 

-2.029 

54.3 

287 

-0.528 

54.3 

188 

0.001 

55.2 

238 

-2.024 

55.3 

288 

-0.204 

56.0 

189 

0.209 

54.4 

239 

-1.961 

56.4 

289 

0.034 

57.0 

190 

0.556 

53.7 

240 

-1.952 

57.2 

290 

0.204 

58.0 

191 

0.782 

53.6 

241 

-1.794 

57.8 

291 

0.253 

58.6 

192 

0.858 

53.6 

242 

-1.302 

58.3 

292 

0.195 

58.5 

193 

0.918 

53.2 

243 

-1.030 

58.6 

293 

0.131 

58.3 

194 

0.862 

52.5 

244 

-0.918 

58.8 

294 

0.017 

57.8 

195 

0.416 

52.0 

245 

-0.798 

58.8 

295 

-0.182 

57.3 

196 

-0.336 

51.4 

246 

-0.867 

58.6 

296 

-0.262 

57.0 

197 

-0.959 

51.0 

247 

-1.047 

58.0 




198 

-1.813 

50.9 

248 

-1.123 

57.4 




199 

-2.378 

52.4 

249 

-0.876 

57.0 




200 

-2.499 

53.5 

250 

-0.395 

56.4 





"Sampling interval 9 seconds; observations for 296 pairs of data points. X, 0.60 — 0.04 (input gas rate in cubic 
feet per minute); Y, %C0 2 in outlet gas. 
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SERIES K Simulated Dynamic Data with Two Inputs" 


t 

X u 

X* 

y, 

t 

X It 

x 2( 

y, 

-2 

0 

0 

58.3 





-1 



61.8 





0 



64.2 

30 



65.8 

1 



62.1 

31 



67.4 

2 

-1 

1 

55.1 

32 

-1 

-1 

64.7 

3 



50.6 

33 



65.7 

4 



47.8 

34 



67.5 

5 



49.7 

35 



58.2 

6 



51.6 

36 



57.0 

7 

1 

-1 

58.5 

37 

-1 

1 

54.7 

8 



61.5 

38 



54.9 

9 



63.3 

39 



48.4 

10 



65.9 

40 



49.7 

11 



70.9 

41 



53.1 

12 

-1 

-1 

65.8 

42 

1 

-1 

50.2 

13 



57.6 

43 



51.7 

14 



56.1 

44 



57.4 

15 



58.2 

45 



62.6 

16 



61.7 

46 



65.8 

17 

1 

1 

59.2 

47 

-1 

-1 

61.5 

18 



57.9 

48 



61.5 

19 



61.3 

49 



56.8 

20 



60.8 

50 



62.3 

21 



63.6 

51 



57.7 

22 

1 

-1 

69.5 

52 

-1 

1 

54.0 

23 



69.3 

53 



45.2 

24 



70.5 

54 



51.9 

25 



68.0 

55 



45.6 

26 



68.1 

56 



46.2 

27 

1 

1 

65.0 

57 

1 

1 

50.2 

28 



71.9 

58 



54.6 

29 



64.8 

59 



55.6 





60 

0 

0 

60.4 





61 



59.4 


64 observations. 
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SERIES L Pilot Scheme Data 3 
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x t 
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t 

*» 
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1 
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-4 
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55 

-4 

2 
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-2 

54 

50 
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0 

2 

3 
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0 

55 
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0 

107 

-90 

8 

4 

0 

0 

56 

40 

-4 

108 

40 

0 

5 

-40 

4 

57 

40 

-6 

109 

0 

0 

6 

0 

2 

58 

-30 

0 

110 

80 

-8 

7 

-10 

2 

59 

20 

-2 

111 

-20 

-2 

8 

10 

0 

60 

-30 

2 

112 

-10 

0 

9 

20 

-2 

61 

10 

0 

113 

-70 

6 

10 

50 

-6 

62 

-20 

2 

114 

-30 

6 

11 

-10 

-2 

63 

30 

-2 

115 

-10 

4 

12 

-55 

4 

64 

-50 

4 

116 

30 

-1 

13 

0 

2 

65 

10 

-2 

117 

-5 

0 

14 

10 

0 

66 

10 

-2 

118 

-60 

6 

15 

0 

-2 

67 

10 

-2 

119 

70 

-4 

16 

10 

-2 

68 

-30 

0 

120 

40 

-6 

17 

-70 

6 

69 

0 

0 

121 

10 

-4 

18 

30 

0 

70 

-10 

2 

122 

20 

-4 

19 

-20 

2 

71 

-10 

3 

123 

10 

-3 

20 

10 

0 

72 

15 

0 
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0 

-2 

21 

0 

0 

73 

20 

-2 
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-70 

6 

22 

0 

0 

74 

-50 

4 

126 

50 

-2 

23 

20 

-2 

75 

20 

0 

127 

30 

-4 

24 

30 

-4 

76 

0 

0 

128 

0 

-2 

25 

0 

-2 

77 

0 

0 

129 
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0 

26 
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0 

78 

0 

0 

130 

0 

0 

27 

-20 

2 

79 

0 

0 
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4 

28 

-30 

4 

80 
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4 

132 

0 

2 

29 

0 

2 

81 

-100 

12 

133 

-10 

2 

30 

10 

0 

82 

0 

8 

134 

10 

0 

31 

20 

-2 

83 

0 

-12 

135 

0 

0 

32 

-10 

0 

84 

50 

-15 

136 

80 

-8 

33 

0 

0 

85 

85 

-15 

137 

-80 

4 

34 

20 

-2 

86 

5 

-12 

138 

20 

4 

35 

10 

-2 

87 

40 

-14 

139 

20 

0 

36 

-10 

0 

88 

10 

-8 

140 

-10 

2 

37 

0 

0 

89 

-60 

2 

141 

10 

0 

38 

0 

0 

90 

-50 

6 

142 

0 

0 

39 

0 

0 

91 

-50 

8 

143 

-20 

2 

40 

0 

0 

92 

40 

0 

144 

20 

-1 

41 

0 

0 

93 

0 

0 

145 

55 

-6 

42 

0 

0 

94 

0 

0 

146 

0 

-3 

43 

20 

-2 

95 

-20 

2 

147 

25 

-4 

44 

-50 

4 

96 

-30 

4 

148 

20 

-4 

45 

20 

0 

97 

-60 

8 
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-60 

4 

46 

0 

0 

98 

-20 

6 

150 

-40 

6 

47 

0 

0 

99 

-30 

6 

151 

10 

4 

48 

40 

-4 
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30 

0 
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20 

0 
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SERIES L ( continued ) 
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x, 
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t 

X, 

£, 

49 
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60 

-6 

50 
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80 
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0 
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3 
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2 
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4 
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20 

0 
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4 
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4 
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0 

0 
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40 

-2 
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35 

-2 

159 

20 

-2 
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-90 

8 
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70 

8 
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10 

-2 
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40 

0 
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-5 
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10 

-2 
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0 

0 
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10 

-22 

214 

0 

0 
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-8 
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50 

-6 

215 

0 

0 
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-40 

0 

164 

-30 

0 

216 

20 

-2 
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-20 

2 
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-30 

6 

217 

90 

-10 

269 

10 

0 

166 

90 

12 

218 

30 

-8 

270 

0 

0 

167 

60 

0 

219 

20 

-6 

271 

0 

0 

168 

-40 

4 

220 

30 

-6 

272 

-20 

2 

169 

20 

0 

221 

30 

-6 

273 

-50 

6 

170 

0 

0 

222 

30 

-6 

274 

50 

-2 

171 

20 

-2 

223 

30 

-6 

275 

30 

-4 

172 

10 

-2 

224 

-90 

6 

276 

60 

-8 

173 

-30 

2 

225 

10 

2 

277 

-40 

0 
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-30 

4 

226 

10 

2 
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-20 

2 

175 

0 

2 
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-30 

4 

279 

-10 

2 
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50 

-4 
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4 

280 

10 

0 
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-60 

4 

229 

40 

-2 

281 

-110 

13 
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20 

0 
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-2 
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4 
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0 

0 

231 
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-2 

283 
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-2 

180 

40 

-8 
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10 

-2 

284 
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-1 
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-100 

12 

285 
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-3 
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-8 
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10 

6 

286 

-5 

-1 
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6 
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45 

-2 
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1 
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-4 
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-4 
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30 

-5 
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40 

-6 
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2 
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20 
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-1 
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6 
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60 

-6 
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-85 
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20 

1 
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-2 
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5 

0 
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-4 
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0 
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4 
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2 
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60 

-4 
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0 

6 
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-2 

245 

40 

-6 
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8 
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1 
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0 
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40 

0 
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5 

0 
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-40 

4 
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-20 

2 
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-20 

2 

248 

-40 

6 

300 
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4 
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50 

-2 
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-4 
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20 

0 
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10 

-2 
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0 

-2 
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10 

-1 
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-4 
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30 
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SERIES L ( continued ) 
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1 
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0 
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255 
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6 
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3 
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-5 

0 
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0 
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5 
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0 
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“312 observations. 
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SERIES M Sales Data with Leading Indicator" 
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95 

12.92 

245.3 
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46 
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218.7 

96 

12.64 

246.0 

146 

13.50 

263.3 

47 

10.99 

222.9 

97 

12.79 

246.3 

147 

13.58 

262.8 

48 

11.01 

224.9 

98 

13.05 

247.7 

148 

13.51 

261.8 

49 

10.84 

222.2 

99 

12.69 

247.6 

149 

13.77 
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50 

10.76 

220.7 

100 

13.01 

247.8 

150 

13.40 
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150 observations. 
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SERIES N Mink Fur Sales of the Hudson’s Bay Company: Annual for 1850-1911“ 


1850 

29,619 

1866 

51,404 


45,600 

Hg| 

76,365 

1851 

21,151 

1867 

58,451 


47,508 


70,407 

1852 

24,859 

1868 

73,575 


52,290 


41,839 

1853 

25,152 

1869 

74,343 

1885 

110,824 


45,978 

1854 

42,375 

1870 

27,708 

1886 

76,503 


47,813 

1855 

50,839 

1871 

31,985 

1887 

64,303 


57,620 

1856 

61,581 

1872 

39,266 

1888 

83,023 


66,549 

1857 

61,951 

1873 

44,740 

1889 

40,748 


54,673 

1858 

76,231 

1874 

60,429 

1890 

35,596 


55,996 

1859 

63,264 

1875 

72,273 

1891 

29,479 


60,053 

1860 

44,730 

1876 

79,214 

1892 

42,264 


39,169 

1861 

31,094 

1877 

79,060 

1893 

58,171 


21,534 

1862 

49,452 

1878 

84,244 

1894 

50,815 


17,857 

1863 

43,961 

1879 

62,590 

1895 

51,285 


21,788 

1864 

61,727 

1880 

35,072 

1896 

70,229 


33,008 

1865 

60,334 

1881 

36,160 






“62 observations. 


SERIES P Unemployment and GDP Data In UK: Quarterly for 1955-1969“ 




UN 

GDP 



UN 

GDP 



UN 

GDP 

1955 

1 

225 

81.37 

1960 

1 

363 

92.30 

1965 

1 

306 

108.07 


2 

208 

82.60 


2 

342 

92.13 


2 

304 

107.64 


3 

201 

82.30 


3 

325 

93.17 


3 

321 

108.87 


4 

199 

83.00 


4 

312 

93.50 


4 

305 

109.75 

1956 

1 

207 

82.87 

1961 

1 

291 

94.77 

1966 

1 

279 

110.20 


2 

215 

83.60 


2 

293 

95.37 


2 

282 

110.20 


3 

240 

83.33 


3 

304 

95.03 


3 

318 

110.90 


4 

245 

83.53 


4 

330 

95.23 


4 

414 

110.40 

1957 

1 

295 

84.27 

1962 

1 

357 

95.07 

1967 

1 

463 

111.00 


2 

293 

85.50 


2 

401 

96.40 


2 

506 

112.10 


3 

279 

84.33 


3 

447 

96.97 


3 

538 

112.50 


4 

287 

84.30 


4 

483 

96.50 


4 

536 

113.00 

1958 

1 

331 

85.07 

1963 

1 

535 

96.16 

1968 

1 

544 

114.30 


2 

396 

83.60 


2 

520 

99.79 


2 

541 

115.10 


3 

432 

84.37 


3 

489 

101.14 


3 

547 

116.40 


4 

462 

84.50 


4 

456 

102.95 


4 

532 

117.80 

1959 

1 

454 

85.20 

1964 

1 

386 

103.96 

1969 

1 

532 

116.80 


2 

446 

87.07 


2 

368 

105.28 


2 

519 

117.80 


3 

426 

88.40 


3 

358 

105.81 


3 

547 

119.00 


4 

402 

90.03 


4 

330 

107.14 


4 

544 

119.60 


Source: Bray (1971). 

“60 pairs of data; data are seasonally adjusted; unemployment (UN) in thousands; gross domestic product (GDP) 
is composite estimate (1963 = 100). 
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SERIES Q Logged and Coded U.S. Hog Price Data: Annual for 1867-1948° 


1867 

597 

1888 

709 

1909 

810 

1929 

1112 

1868 

509 

1889 

763 

1910 

957 

1930 

1129 

1869 

663 

1890 

681 

1911 

970 

1931 

1055 

1870 

751 

1891 

627 

1912 

903 

1932 

787 

1871 

739 

1892 

667 

1913 

995 

1933 

624 

1872 

598 

1893 

804 

1914 

1022 

1934 

612 

1873 

556 

1894 

782 

1915 

998 

1935 

800 

1874 

594 

1895 

707 

1916 

928 

1936 

1104 

1875 

667 

1896 

653 

1917 

1073 

1937 

1075 

1876 

776 

1897 

639 

1918 

1294 

1938 

1052 

1877 

754 

1898 

672 

1919 

1346 

1939 

1048 

1878 

689 

1899 

669 

1920 

1301 

1940 

891 

1879 

498 

1900 

729 

1921 

1134 

1941 

921 

1880 

643 

1901 

784 

1922 

1024 

1942 

1193 

1881 

681 

1902 

842 

1923 

1090 

1943 

1352 

1882 

778 

1903 

886 

1924 

1013 

1944 

1243 

1883 

829 

1904 

784 

1925 

1119 

1945 

1314 

1884 

751 

1905 

770 

1926 

1195 

1946 

1380 

1885 

704 

1906 

783 

1927 

1235 

1947 

1556 

1886 

633 

1907 

877 

1928 

1120 

1948 

1632 

1887 

663 

1908 

111 






Source: Quenouille (1957). 

"82 observations; values are 1000 \og i0 (H t ), where H t is the price, in dollars, per head on January 1 of the year. 


SERIES R Monthly Averages of Hourly Readings of Ozone in Downtown Los Angeles" 



Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1955 

2.63 

1.94 

3.38 

4.92 

6.29 

5.58 

5.50 

4.71 

6.04 

7.13 

7.79 

3.83 

1956 

3.83 

4.25 

5.29 

3.75 

4.67 

5.42 

6.04 

5.71 

8.13 

4.88 

5.42 

5.50 

1957 

3.00 

3.42 

4.50 

4.25 

4.00 

5.33 

5.79 

6.58 

7.29 

5.04 

5.04 

4.48 

1958 

3.33 

2.88 

2.50 

3.83 

4.17 

4.42 

4.25 

4.08 

4.88 

4.54 

4.25 

4.21 

1959 

2.75 

2.42 

4.50 

5.21 

4.00 

7.54 

7.38 

5.96 

5.08 

5.46 

4.79 

2.67 

1960 

1.71 

1.92 

3.38 

3.98 

4.63 

4.88 

5.17 

4.83 

5.29 

3.71 

2.46 

2.17 

1961 

2.15 

2.44 

2.54 

3.25 

2.81 

4.21 

4.13 

4.17 

3.75 

3.83 

2.42 

2.17 

1962 

2.33 

2.00 

2.13 

4.46 

3.17 

3.25 

4.08 

5.42 

4.50 

4.88 

2.83 

2.75 

1963 

1.63 

3.04 

2.58 

2.92 

3.29 

3.71 

4.88 

4.63 

4.83 

3.42 

2.38 

2.33 

1964 

1.50 

2.25 

2.63 

2.96 

3.46 

4.33 

5.42 

4.79 

4.38 

4.54 

2.04 

1.33 

1965 

2.04 

2.81 

2.67 

4.08 

3.90 

3.96 

4.50 

5.58 

4.52 

5.88 

3.67 

1.79 

1966 

1.71 

1.92 

3.58 

4.40 

3.79 

5.52 

5.50 

5.00 

5.48 

4.81 

2.42 

1.46 

1967 

1.71 

2.46 

2.42 

1.79 

3.63 

3.54 

4.88 

4.96 

3.63 

5.46 

3.08 

1.75 

1968 

2.13 

2.58 

2.75 

3.15 

3.46 

3.33 

4.67 

4.13 

4.73 

3.42 

3.08 

1.79 

1969 

1.96 

1.63 

2.75 

3.06 

4.31 

3.31 

3.71 

5.25 

3.67 

3.10 

2.25 

2.29 

1970 

1.25 

2.25 

2.67 

3.23 

3.58 

3.04 

3.75 

4.54 

4.46 

2.83 

1.63 

1.17 

1971 

1.79 

1.92 

2.25 

2.96 

2.38 

3.38 

3.38 

3.21 

2.58 

2.42 

1.58 

1.21 

1972 

1.42 

1.96 

3.04 

2.92 

3.58 

3.33 

4.04 

3.92 

3.08 

2.00 

1.58 

1.21 


'216 observations; values are in pphm. 
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